Code Review

This guide will introduces you to generating code change suggestions, from an angle of resilience and reliability, using LLM.

The proposed changes are proposed as unified diff that help you visualize what fault suggests you may want to add or remove from your code.

Prerequisites

Install fault

If you haven’t installed fault yet, follow the installation instructions.
Get an OpenAI Key

For the purpose of the guide, we will be using OpenAI models. You need to create an API key. Then make sure the key is available for fault:
```
export OPENAI_API_KEY=sk-...
```
Install a local qdrant database

fault uses qdrant for its vector database. You can install a local, free, qdrant using docker:
```
docker run -p 6333:6333 -p 6334:6334 -v "$(pwd)/qdrant_storage:/qdrant/storage:z" qdrant/qdrant
```

Windows not supported

Unfortunately, the agent feature is not supported on Windows because the framework used by fault to interact with LLM does not support that platform.

Experimental feature

This feature is still experimental and is subject to change. Dealing with LLM requires accepting a level of fuzzyness and adjustments. Engineering is still very much a human endeavour!

Review a Python Web Application

In this scenario we take a very basic Python application, using the FastAPI and SQLAlchemy (sqlite) libraries. We want to learn what we can from this application.

Source code of the application

webapp/app.py

#!/usr/bin/env -S uv run --script

# /// script
# dependencies = [
#   "uvicorn",
#   "fastapi[standard]",
#   "sqlalchemy"
# ]
# ///

###############################################################################
#
# Very basic application that expose a couple of endpoints that you can
# use to test fault.
# Once you have installed `uv` https://docs.astral.sh/uv/, simply run the
# application as follows:
# 
# uv run --script app.py
#
###############################################################################
from typing import Annotated

import uvicorn
from fastapi import FastAPI, HTTPException, Depends, status, Body
from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.orm import declarative_base, sessionmaker, Session
from sqlalchemy.exc import SQLAlchemyError


###############################################################################
# Database configuration
###############################################################################
engine = create_engine("sqlite:///./test.db")
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
Base = declarative_base()


###############################################################################
# Data model
###############################################################################
class User(Base):
    __tablename__ = "users"

    id = Column(Integer, primary_key=True, index=True)
    name = Column(String, index=True)
    password = Column(String)

Base.metadata.create_all(bind=engine)


###############################################################################
# Dependency injection
###############################################################################
def get_db():
    db = SessionLocal()
    try:
        yield db
    finally:
        db.close()


###############################################################################
# Our application
###############################################################################
app = FastAPI(servers=[{"url": "http://localhost:9090"}])


@app.get("/")
async def index() -> dict[str, str]:
    return {"message": "Hello, World!"}


@app.post("/users/")
async def create_user(
    name: Annotated[str, Body()],
    password: Annotated[str, Body()],
    db: sessionmaker[Session] = Depends(get_db)
):
    db_user = User(name=name, password=password)
    db.add(db_user)
    db.commit()
    db.refresh(db_user)

    return db_user


@app.get("/users/{user_id}")
async def read_user(
    user_id: int, db: sessionmaker[Session] = Depends(get_db)
):
    try:
        user = db.query(User).filter(User.id == user_id).first()
        if user is None:
            raise HTTPException(status_code=status.HTTP_404_NOT_FOUND)
        return user
    except SQLAlchemyError as e:
        raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR)


if __name__ == "__main__":
    uvicorn.run("app:app", port=9090)

You may now install the dependencies to run it:

pipuv

pip install fastapi[standard] sqlalchemy uvicorn

uv tool install fastapi[standard] sqlalchemy uvicorn

Finally, run the application as follows:

cd webapp
fastapi dev --port 9090

This application has only a couple of endpoints is purposefully not optimised.

Generate a scenario for this application

We must first generate and run a scenario so we get a mapping of the application.
```
fault scenario generate --scenario scenario.yaml --spec-url http://localhost:9090/openapi.json
```

Run the scenario against this application

fault scenario run --scenario scenario.yaml

Review the code and make suggestions

fault reviews the code by chunking it, indexing it and then asking your favourite LLM for its opinion.

fault agent code-review \
    --results results.json \ # (1)!
    --source-dir webapp/ \ # (2)!
    --source-lang python # (3)!
⠸ [00:00:34] Reviewing! This could take a while...
> index__get [GET http://localhost:9090/]
  create_user_users__post [POST http://localhost:9090/users/]
  read_user_users__user_id__get [GET http://localhost:9090/users/{user_id}]
[↑↓ to move, enter to select, type to filter]

The results from the previous scenario execution
The top-level directory where the application's code lives
The application's language (someday a heuristic will guess this)
Use the arrow keys to select the endpoint you want to review and patch

Assuming we select the first endpoint, fault suggests the following changes:

Generated code review changes report

Below is a set of targeted, small‐scope changes you can drop almost verbatim into your app.py to dramatically improve resilience, prevent cascading slow‐downs, and give you early warning via metrics and tracing. Each change is explained, and I’ve ordered them by immediate impact.

1) Increase and tune your DB-connection pool

Right now you are using SQLite’s default pool (QueuePool size=5). Under even moderate latency or I/O hiccups you’ll saturate your pool and stall new requests forever.

--- a/app.py
+++ b/app.py
@@ ### Database configuration
-engine = create_engine("sqlite:///./test.db")
+engine = create_engine(
+    "sqlite:///./test.db",
+    # allow up to 10 concurrent connections
+    pool_size=10,
+    # and burst to 20 before failing
+    max_overflow=10,
+    # wait up to 5s for a free connection
+    pool_timeout=5,
+)
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)

Why: prevents “head of line” blocking when one connection is slow (disk I/O hiccup, GC pause, etc.).

2) Bound every DB call with a per-request timeout

Even with a bigger pool, a hung query will hold its slot indefinitely. Wrapping your sync calls in asyncio.to_thread + asyncio.wait_for guarantees a hard cap.

--- a/app.py
+++ b/app.py
import asyncio
from fastapi import HTTPException, status
from sqlalchemy.exc import SQLAlchemyError

+# helper that runs sync code in a thread
+def _sync_read_user(db, user_id: int):
+    user = db.query(User).filter(User.id == user_id).first()
+    if user is None:
+        raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="user not found")
+    return user

@app.get("/users/{user_id}")
async def read_user(
    user_id: int, db: sessionmaker[Session] = Depends(get_db)
):
-    try:
-        user = db.query(User).filter(User.id == user_id).first()
-        if user is None:
-            raise HTTPException(status_code=status.HTTP_404_NOT_FOUND)
-        return user
-    except SQLAlchemyError as e:
-        raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR)
+    try:
+        # cap the entire DB roundtrip at 2s
+        return await asyncio.wait_for(
+            asyncio.to_thread(_sync_read_user, db, user_id),
+            timeout=2.0,
+        )
+    except asyncio.TimeoutError:
+        # fast‐fail slow queries
+        raise HTTPException(status_code=status.HTTP_504_GATEWAY_TIMEOUT, detail="DB request timed out")
+    except SQLAlchemyError:
+        # catch transient DB errors
+        raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail="DB error")

Why: prevents a black-hole or very slow query from chewing your entire worker pool and pushing p95 latency off the charts.

3) Add idempotent retries with exponential back-off to your GET

read_user is idempotent. A retry on a transient DB or I/O error will hide packet drops, brief database locks, file-system stalls, etc.

Install tenacity:
```
pip install tenacity
```
Wrap the in-thread helper:

--- a/app.py
+++ b/app.py
from tenacity import (
    retry,
    retry_if_exception_type,
    wait_exponential,
    stop_after_attempt,
)

+# retry only on SQLALchemy transient errors
@retry(
    retry=retry_if_exception_type(SQLAlchemyError),
    wait=wait_exponential(multiplier=0.1, max=1.0),
    stop=stop_after_attempt(3),
    reraise=True,
)
def _sync_read_user(db, user_id: int):
@@
    user = db.query(User).filter(User.id == user_id).first()
    if user is None:
        raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="user not found")
    return user

Why: this will re-try on deadlocks, brief locks, dropped packets, without touching your route signature or calling code.

4) Instrument basic Prometheus metrics + tracing

Early warning is critical: track latencies, error rates and retry counts so you can alert before your users notice.

Install:

pip install prometheus-client starlette_exporter opentelemetry-api opentelemetry-sdk opentelemetry-instrumentation-fastapi

Add to the top of app.py:

from starlette_exporter import PrometheusMiddleware, handle_metrics
from prometheus_client import Counter, Histogram
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter

# -- tracing -----------------------------------------------------------------
trace.set_tracer_provider(TracerProvider())
trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(ConsoleSpanExporter())
)
tracer = trace.get_tracer(__name__)

# -- metrics -----------------------------------------------------------------
app.add_middleware(PrometheusMiddleware)
app.add_route("/metrics", handle_metrics)

REQUEST_LATENCY = Histogram(
    "http_request_latency_seconds", "Latencies by method and path",
    ["method", "endpoint"],
)
REQUEST_COUNT = Counter(
    "http_requests_total", "Total requests by method, path and status",
    ["method", "endpoint", "http_status"],
)

@app.middleware("http")
async def metrics_middleware(request, call_next):
    labels = {"method": request.method, "endpoint": request.url.path}
    with REQUEST_LATENCY.labels(**labels).time():
        response = await call_next(request)
    REQUEST_COUNT.labels(
        **labels, http_status=response.status_code
    ).inc()
    return response

Why: you’ll immediately see tail-latency spikes (p95/p99), error surges, retry storms or pool-timeouts in your dashboards.

5) (Optional) Circuit-breaker for sustained failures

When your DB goes down for more than a few seconds, retries only amplify pressure. A circuit-breaker will short-circuit and fast-fail until the DB recovers.

pip install pybreaker

--- a/app.py
+++ b/app.py
from pybreaker import CircuitBreaker

# trip breaker after 5 errors, reset after 30s
db_breaker = CircuitBreaker(fail_max=5, reset_timeout=30)

-@retry(...)
-def _sync_read_user(...):
+@db_breaker
+@retry(...)
+def _sync_read_user(...):
    ...

Why: prevents retry avalanches against an already-failing downstream.

Summary of Impact

Pool tuning + per-call timeouts → no more “stair-step” tail-latency under slow queries.
Idempotent retries → hide small network/DB blips.
Metrics & tracing → proactive alerting.
Circuit-breaker → quick fail-fast during true outages.

With these four “drop-in” changes you’ll eliminate head-of-line blocking, dramatically reduce error SLO violations on GET /users/{user_id}, and gain visibility into when—and why—your service is struggling.

Optionally Inject Advices from Scenario Executions Analysis

You may improve the generated review by injecting the analysis made from a scenario executed agains the source code under review. This should give much more context to reason about.

Run an analysis from a past scenario results:

fault agent scenario-review --results results.json

This will generate a file called scenario-review-report.md. Now you can inject this file into the code review command line:

fault agent code-review \
    --results results.json \
    --source-dir webapp/ \
    --source-lang python \
    --scenario-review-report scenario-review-report.md # (1)!
? Select the OpenAPI operationId to patch:
> read_root__get [GET http://localhost:9090/]
  create_user_users__post [POST http://localhost:9090/users/]
  read_user_users__user_id__get [GET http://localhost:9090/users/{user_id}]
[↑↓ to move, enter to select, type to filter]

Pass the generated report. You can omit this, if the file exists in the current directory, it will be read.

Assuming again we select the first endpoint, fault suggests now the following changes:

Generated code review changes report after scenario analysis

Here are four focused, minimally-intrusive changes you can make today to dramatically improve resilience, reliability and observability in your FastAPI/SQLAlchemy app.

Wrap every transaction in an explicit context manager and rollback on failure
Right now you do:

db.add(db_user)
db.commit()
db.refresh(db_user)

If commit() fails you never roll back, leaving the session in an invalid state. Instead use:

from sqlalchemy.exc import SQLAlchemyError

@app.post("/users/")
async def create_user(
    name: str = Body(...),
    password: str = Body(...),
    db: Session = Depends(get_db),
):
    try:
        # begin() will automatically rollback on exception
        with db.begin():
            user = User(name=name, password=password)
            db.add(user)
        # now safe to refresh
        db.refresh(user)
        return user
    except SQLAlchemyError as e:
        # session.rollback() already called by begin()
        # you can log e here
        raise HTTPException(
            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
            detail="could not create user",
        )

Add a simple retry with exponential back-off around commits
Transient “database is locked” errors in SQLite (and some cloud-SQL networks) can often be overcome by a retry. The tenacity library gives you a one-liner:

pip install tenacity

from tenacity import retry, wait_exponential, stop_after_attempt

@retry(wait=wait_exponential(multiplier=0.2, max=2), stop=stop_after_attempt(3))
def safe_commit(db: Session):
    db.commit()

@app.post("/users/")
async def create_user(...):
    try:
        with db.begin():
            user = User(...)
            db.add(user)
        # retry commit if it hits a transient lock
        safe_commit(db)
        db.refresh(user)
        return user
    except SQLAlchemyError:
        raise HTTPException(500, "db error")

Enforce a per-request timeout
A hung or extremely slow request ties up your worker. Adding a single middleware gives you a hard cap on processing time:

import asyncio
from fastapi import Request

@app.middleware("http")
async def timeout_middleware(request: Request, call_next):
    # 5 seconds max per request
    try:
        return await asyncio.wait_for(call_next(request), timeout=5.0)
    except asyncio.TimeoutError:
        raise HTTPException(504, "request timed out")

Add basic metrics and tracing hooks
Knowing “what just broke” is half the battle. Two minutes to add Prometheus metrics:

pip install prometheus_client

import time
from prometheus_client import Counter, Histogram, make_asgi_app
from starlette.middleware import Middleware
from starlette.middleware.base import BaseHTTPMiddleware

REQUEST_COUNT = Counter("http_requests_total", "Request count", ["method", "endpoint", "status"])
REQUEST_LATENCY = Histogram("http_request_latency_seconds", "Latency", ["method", "endpoint"])

class MetricsMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request, call_next):
        start = time.time()
        response = await call_next(request)
        elapsed = time.time() - start
        key = (request.method, request.url.path, response.status_code)
        REQUEST_COUNT.labels(*key).inc()
        REQUEST_LATENCY.labels(request.method, request.url.path).observe(elapsed)
        return response

app.add_middleware(MetricsMiddleware)
# mount /metrics for Prometheus to scrape
app.mount("/metrics", make_asgi_app())

With these four changes in place you will have:
- safe transactions that always roll back on error
- automatic retries for common transient failures
- a hard deadline for every HTTP call
- real-time metrics you can hook into your alerting system

Tip

In a future release, fault will be able to apply and try the changes to verify they may be used safely.