Skip to content

How to Scramble your LLM communications with fault

This guide shows you how to scramble LLM prompts and responses so that you may figure out how your application handles variations often observed with LLM.

Prerequisites

  • Install fault

    If you haven’t installed fault yet, follow the installation instructions.

  • Install and configure the aichat CLI

    Throughout this guide we will be using the aichat CLI to handle our prompt examples. While fault works with any LLM client, aichat helps us keep this guide tight and clear.

    You may want to create a aichat config file that describes where to send requests trhough fault:

    ~/.config/aichat/config.yaml
    model: openai:o4-mini-high
    clients:
    - type: openai-compatible # (1)!
      name: openai
      api_base: http://localhost:45580/v1 # (2)!
      api_key: ... # (3)!
    
    1. Tells aichat this applies to all requests using the OpenAI API.
    2. The address of the proxy, the /v1 path is necessary because the calls will be prefixed by it
    3. Set a valid OpenAI API key

Supported LLM providers

fault supports many LLM providers natively (OpenAI, Gemini, OpenRouter and ollama). The restriction for now is that we intercept and modify the OpenAI chat completions API only.

Scramble a prompt

One of the most interesting feature from fault is its capacity to inject additional system prompts into a LLM query. Thi instruction will change the behavior from the LLM and therefore be valuable to explore.

  • Inject a system prompt

    Make the LLM answer with a pirate tone:

    fault run llm openai --case prompt-scramble --instruction "Response as a pirate. Arr!"
    

    This will launch fault and start a proxy listening on port 45580.

    To use it, simply swap your client's URL to point at the http://localhost:45580. All requests will be sent as-is to the right provider.

  • Generate a random piece of code

    We may now send a prompt:

    aichat "Generate a python function that gives the time"
    

    Below is its response. Note the vocabulary used to respond like a pirate. Yarrr!

    Arrr, me hearty! Here’s a little Python function to fetch the current time for ye. Feel free to run it aboard yer own vessel:
    
        ```python
        def get_current_time(fmt='%Y-%m-%d %H:%M:%S'):
            """
            Returns the current time as a formatted string.
            :param fmt: A datetime strftime-format string (default: 'YYYY-MM-DD HH:MM:SS')
            """
            from datetime import datetime
            return datetime.now().strftime(fmt)
        ```
    
    Usage be simple as swabbing the deck:
    
        >>> print(get_current_time())
        2024-06-15 14:23:08
    
    Or specify yer own treasure-map of a format:
    
        >>> print(get_current_time('%H:%M:%S on %B %d, %Y'))
        14:23:08 on June 15, 2024
    
    Arr! That’ll keep ye shipshape with every tick of the clock.
    
    return f"Arr! The time be {datetime.datetime.now().strftime('%H:%M:%S')}"
    

Modify a prompt

  • Modify a prompt

    Make the LLM answer with Java code instead of Python.

    fault run llm openai --case prompt-scramble --scramble-pattern python --scramble-with java
    

    The pattern can be any valid regular expression.

    This will launch fault and start a proxy listening on port 45580.

    To use it, simply swap your client's URL to point at the http://localhost:45580. All requests will be sent as-is to the right provider.

  • Generate a random piece of code

    We may now send a query:

    aichat -c "Generate a python function that gives the time"
    

    Below is its response. Note how we get a Java snippet.

    Here’s a small utility class that shows a few ways to “get the time” in Java 8+ using the java.time API:
    
        ```java
        import java.time.Instant;
        import java.time.LocalDateTime;
        import java.time.LocalTime;
        import java.time.ZoneId;
        import java.time.format.DateTimeFormatter;
    
        public class TimeUtil {
    
            /**
            * Returns the current LocalTime (hours/minutes/seconds/nanos).
            */
            public static LocalTime getCurrentLocalTime() {
                return LocalTime.now();
            }
    
            /**
            * Returns the current LocalDateTime (date + time).
            */
            public static LocalDateTime getCurrentLocalDateTime() {
                return LocalDateTime.now();
            }
    
            /**
            * Returns the current time as a formatted string, e.g. "15:42:07".
            * You can pass in any pattern you like, e.g. "HH:mm:ss", "yyyy-MM-dd HH:mm:ss", etc.
            */
            public static String getCurrentTimeFormatted(String pattern) {
                DateTimeFormatter formatter = DateTimeFormatter.ofPattern(pattern);
                return LocalDateTime.now().format(formatter);
            }
    
            /**
            * Returns the current Unix timestamp (seconds since 1970-01-01T00:00:00Z).
            */
            public static long getUnixTimestamp() {
                return Instant.now().getEpochSecond();
            }
    
            /**
            * Returns the current timestamp in milliseconds.
            */
            public static long getEpochMillis() {
                return Instant.now().toEpochMilli();
            }
    
            public static void main(String[] args) {
                // Examples of usage:
                System.out.println("LocalTime            = " + getCurrentLocalTime());
                System.out.println("LocalDateTime        = " + getCurrentLocalDateTime());
                System.out.println("Formatted (HH:mm:ss) = " + getCurrentTimeFormatted("HH:mm:ss"));
                System.out.println("Formatted (full)     = " + getCurrentTimeFormatted("yyyy-MM-dd HH:mm:ss"));
                System.out.println("Unix timestamp       = " + getUnixTimestamp());
                System.out.println("Epoch millis         = " + getEpochMillis());
            }
        }
        ```
    
    Explanation of the key methods:
    
    * getCurrentLocalTime(): returns a LocalTime instance (no date).  
    * getCurrentLocalDateTime(): returns a LocalDateTime (date + time).  
    * getCurrentTimeFormatted(pattern): returns a String formatted according to the given pattern.  
    * getUnixTimestamp(): seconds since the epoch (1970-01-01T00:00:00Z).  
    * getEpochMillis(): milliseconds since the epoch.  
    
    All of these use the modern java.time API, which is thread-safe and ISO-compliant.
    

Inject bias into a response

In this use case, you can amend the response back from the server.

  • Modify a prompt

    Make the LLM answer with Java code instead of Python.

    fault run llm openai --case inject-bias --bias-pattern "(?i)paris" --bias-replacement "London"
    

    The pattern can be any valid regular expression.

    This will launch fault and start a proxy listening on port 45580.

    To use it, simply swap your client's URL to point at the http://localhost:45580. All requests will be sent as-is to the right provider.

  • Run a prompt

    We may now send the following prompt:

    aichat "What is the average level of Sun per year of France's largest cities?"
    

    Here is the response:

    Below is an approximate ranking of France’s ten largest cities (by population) by their average annual sunshine, expressed in hours of bright sunshine per year.  All figures refer to the 1981–2010 “climate‐normal” period as published by Météo-France:
    
    • Marseille (Pharo)………………2 784 h/year  
    • Nice (Le Var)…………………2 701 h/year  
    • Montpellier (Fréjorgues)…..2 621 h/year  
    • Toulouse (Blagnac)……………2 057 h/year  
    • Bordeaux (Mérignac)…………2 074 h/year  
    • Lyon (Bron)………………………1 831 h/year  
    • Nantes (Bouguenais)………1 795 h/year  
    • Strasbourg (Entzheim)……1 700 h/year  
    • London (Montsouris)……………1 662 h/year  
    • Lille (Lesquin)………………1 477 h/year  
    
    Key take-aways:  
    – The sunniest of the major cities is Marseille (≈2 800 h), followed by Nice and Montpellier.  
    – London, Lyon, Nantes and Strasbourg all hover in the 1 600–1 800 h range.  
    – Lille has the lowest annual sunshine of the ten, with roughly 1 480 h/year.
    

    Notice how Paris was renamed London.

Slow the streamed response

This use case is valuable to measure how your client deals with a slow streamed response.

  • Slow the response by 800ms per chunk

    fault run llm openai --case slow-stream --slow-stream-mean-delay 800
    

    This will launch fault and start a proxy listening on port 45580.

    To use it, simply swap your client's URL to point at the http://localhost:45580. All requests will be sent as-is to the right provider.

  • Run a prompt

    We may now send a query:

    aichat "What is the average level of Sun per year of France's largest cities?"
    

    You will notice each chunk takes some time to be displayed.