Fault Injection Into Google Cloud Platform

This guide will walk you through injecting network faults into Google Cloud Platform Cloud Run. You will not need to change any code.

Prerequisites

Install fault

If you haven’t installed fault yet, follow the installation instructions.

Inject Latency Into a Cloud Run Service

Clmoud Run is the GCP platform to run workload using containers. The approach taken by fault is to create a new revision where we add a sidecar container to an existing Cloud Run specification. This container then becomes the entrypoint of network traffic. fault is configured to then route all traffic from that port to the application's port transparently. When done, we rollback to the previous revision.

raffic Before fault Is Injected

---
config:
  theme: 'default'
  themeVariables:
      'git0': '#ff00ff'
  gitGraph:
    showBranches: true
    showCommitLabel: true
    mainBranchName: 'normal'
---
gitGraph
       commit id: "LB"
       commit id: "Backend Service"
       commit id: "Cloud Run"
       commit id: "Application Container"

Traffic After fault Is Injected

---
config:
  theme: 'default'
  themeVariables:
      'git0': '#ff00ff'
      'git1': '#00ffff'
  gitGraph:
    showBranches: true
    showCommitLabel: true
    mainBranchName: 'normal'
---
gitGraph
       commit id: "LB"
       commit id: "Injected" type: HIGHLIGHT
       commit id: "Backend Service"
       branch fault
       commit id: "Cloud Run"
       commit id: "fault Container"
       commit id: "Application Container"
       checkout normal
       merge fault id: "Rolled back" type: HIGHLIGHT

Create a basic Cloud Run service

You may want to follow the official GCP documentation to deploy a sample service.

Upload the fault container image to a GCP artifactory

Cloud Run will expect the fault image to be pulled from an artifactory in the same region (or a global one). So this means, you must upload the official fault image to your own artifactory repository.

Follow the official documentation to upload the fault image

Something along the lines:

# locally download the official fault image
docker pull ghcr.io/rebound-how/fault:<version>

# tag it to match your nex GCP Artifactory repository
docker tag ghcr.io/rebound-how/fault:<version> <region>-docker.pkg.dev/<project>/<repository>/fault:<version>

# push it to the repository
docker push <region>-docker.pkg.dev/<project>/<repository>/fault:<version>

Inject fault into the nginx service

The following injects a 800ms into the service response time.
```
fault inject gcp \
    --project <project> \  # (1)!
    --region <region>  \  # (2)!
    --service <service> \  # (3)!
    --image <image> \  # (4)!
    --duration 30s \  # (5)!
    --with-latency --latency-mean 800
```
1. The GCP project where your CloudRun service is running
2. The GCP region where your CloudRun service is running
3. The GCP CloudRun service name
4. The fault container image full url
5. Optional duration after which the injection rollbacks. If unset, the user input is expected
When you do not explicitly set the service, fault lets you pick up one from the CLI:
```
fault inject gcp \
    --project <project> \
    --region <region>  \
    --image <image> \
    --with-latency --latency-mean 800
? Service:  
> hello
[↑↓ to move, enter to select, type to filter]
```
Once started, a new revision of the service will be deployed with the fault process running as a sidecar container of the service's main container. It will expose a port to receive traffic and route it to the application.