KAOS Monkey: Kubernetes Chaos Agent
Try it yourself! This example is available as an executable Jupyter notebook.
This example demonstrates building a "chaos monkey" style agent that can interact with your Kubernetes cluster using the Kubernetes MCP Server. The agent uses MCP tools to execute operations, controlled by deterministic mock responses.
Understanding the Flow
WARNING
This example demonstrates powerful capabilities. Use with caution in production environments.
Prerequisites
- KAOS operator installed (Installation Guide)
kaos-cliinstalled- Access to a Kubernetes cluster
Overview
We'll create an agent that can:
- List pods in a namespace using the
pods_listtool - Delete specific pods using the
pods_deletetool - Return results of operations
The agent uses mock responses for deterministic behavior which allows us to control exactly what the LLM "decides" to do, making the example reproducible and testable (and doesn't require setting up external ModelAPI).
Setup
First, let's set up the environment and create a namespace for this example:
import os
os.environ['NAMESPACE'] = 'kaos-monkey-example'kubectl create namespace $NAMESPACE 2>/dev/null || true
kubectl config set-context --current --namespace=$NAMESPACEStep 1: Create a ModelAPI
Create a ModelAPI in Proxy mode (we'll use mock responses so no real LLM needed):
kaos modelapi deploy chaos-api --mode Proxy --waitStep 2: Set Up RBAC for Kubernetes MCP Server
The Kubernetes MCP server needs permissions to interact with the Kubernetes API:
kaos system create-rbac k8s-mcp --resources pods --verbs list,get,deleteStep 3: Create the Kubernetes MCP Server
Deploy the Kubernetes MCP server using the built-in kubernetes runtime with the service account we just created:
kaos mcp deploy k8s-tools --runtime kubernetes --sa k8s-mcp --waitStep 4: Create a Test Pod
Create a simple test pod that our chaos agent can target:
kubectl run chaos-victim --image=nginx:alpine --restart=Never | echo "exists"Wait for pod to be running:
kubectl wait pod/chaos-victim --for=condition=ready --timeout=60sStep 5: Create the Chaos Agent
Create the agent with mock responses. The --mock-response flag can be used multiple times - each response is consumed in sequence:
# Build mock responses with JSON action format
# Two-phase loop: action1 -> action2 -> no-action -> final
MOCK1='{"tool": "pods_list", "arguments": {"namespace": "'$NAMESPACE'"}}'
MOCK2='{"tool": "pods_delete", "arguments": {"namespace": "'$NAMESPACE'", "name": "chaos-victim"}}'
MOCK3='{}'
MOCK4='Done! I have deleted the chaos-victim pod to simulate a failure scenario.'
kaos agent deploy kaos-monkey \
--modelapi chaos-api \
--model mock-model \
--mcp k8s-tools \
--instructions "You are KAOS Monkey, a chaos engineering agent." \
--mock-response "$MOCK1" \
--mock-response "$MOCK2" \
--mock-response "$MOCK3" \
--mock-response "$MOCK4" \
--expose \
--waitStep 6: Unleash the Chaos
Now invoke the chaos agent to delete the pod:
kaos agent invoke kaos-monkey --message "Cause some chaos by deleting a pod"Step 7: Verify the Chaos
Check that the pod was deleted:
import subprocess
# Verify pod was deleted - this should fail (pod not found)
result = subprocess.run(["kubectl", "get", "event"], capture_output=True)
r_str = str(result.stdout)
if "Killing" in r_str and "pod/chaos-victim" in r_str:
print("SUCCESS: Pod was deleted by the chaos agent!")
else:
raise AssertionError("FAILED: Pod still exists - chaos agent did not delete it")Understanding Mock Responses
The mock responses include JSON action blocks (e.g., {"tool": "..."}) that trigger real MCP tool execution - only the LLM reasoning is mocked.
This is essential for:
- Testing: Deterministic behavior in CI/CD
- Cost savings: No LLM API calls during development
- Reproducibility: Same inputs always produce same outputs
Kubernetes MCP Server Tools
The kubernetes runtime provides many useful tools:
pods_list,pods_get,pods_delete,pods_log,pods_execnamespaces_list,resources_list,resources_create_or_updatehelm_install,helm_list,helm_uninstall- And more! See the kubernetes-mcp-server documentation.
Cleanup
kubectl delete namespace $NAMESPACE --wait=falseNext Steps
- Multi-Agent Telemetry - Add observability
- Gateway API - Secure your agent endpoints
- Agent CRD Reference - Full configuration options