Skip to content

Agentic Loop

The agentic loop is handled by Pydantic AI's built-in run loop. The KAOS Agent wraps this with memory persistence, delegation tools, and telemetry.

How It Works

Key Differences from Previous Architecture

AspectPreviousCurrent (Pydantic AI)
Loop controlCustom two-phase loopPydantic AI run() / run_stream()
Tool callingManual extraction + string-mode fallbackNative Pydantic AI tool registration
Step limitCustom counterUsageLimits(request_limit=max_steps)
Model detectionlitellm.supports_function_calling()Pydantic AI handles internally
StreamingCustom Phase 2run_stream() with stream_text()

Configuration

The loop is controlled by max_steps on the Agent:

python
agent = Agent(
    name="my-agent",
    model_api_url="http://ollama:11434",
    model_name="llama3.2",
    max_steps=5,
)

max_steps maps to UsageLimits(request_limit=max_steps) passed to Pydantic AI's run().

Tool Registration

MCP Tools

MCP servers are passed as toolsets to the Pydantic AI agent:

python
from pydantic_ai_slim.pydantic_ai.mcp import MCPServerStreamableHTTP

mcp = MCPServerStreamableHTTP(url="http://mcp-server:8000/mcp")
agent._pydantic_agent = PydanticAgent(
    model=model,
    system_prompt=instructions,
    mcp_servers=[mcp],
)

Delegation Tools

Sub-agents are registered as @agent.tool_plain functions with delegate_to_ prefix:

python
@self._pydantic_agent.tool_plain(name=f"delegate_to_{name}")
async def delegate(task: str) -> str:
    return await self._delegate_to_sub_agent(name, task, session_id)

The LLM decides when to delegate based on the tool description.

Memory Integration

Before each run() call, the agent:

  1. Creates/retrieves a session
  2. Stores the user message event
  3. Builds conversation history from memory events → Pydantic AI ModelRequest/ModelResponse objects
  4. After completion, extracts and persists all new events (tool calls, results, final response)

Memory events are bridged between KAOS MemoryEvent format and Pydantic AI's ModelMessage types via _memory_events_to_messages() and _extract_and_persist_events().

Mock Testing

Use DEBUG_MOCK_RESPONSES for deterministic testing. The framework uses Pydantic AI's FunctionModel internally:

bash
# Simple response (no tools) — 1 entry
export DEBUG_MOCK_RESPONSES='["Hello!"]'

# Tool call + final response — 2 entries
export DEBUG_MOCK_RESPONSES='["{\"tool_calls\": [{\"id\": \"call_1\", \"name\": \"echo\", \"arguments\": {\"message\": \"hi\"}}]}", "Done."]'

# Delegation — 2 entries
export DEBUG_MOCK_RESPONSES='["{\"tool_calls\": [{\"id\": \"call_1\", \"name\": \"delegate_to_worker\", \"arguments\": {\"task\": \"process\"}}]}", "Complete."]'

Mock Pattern

The previous architecture required 3 mock entries (tool call → no-action → final). With Pydantic AI, only 2 entries are needed (tool call → final).

Kubernetes E2E

Configure mock responses via the Agent CRD:

yaml
spec:
  container:
    env:
    - name: DEBUG_MOCK_RESPONSES
      value: '["{\"tool_calls\": [{\"id\": \"call_1\", \"name\": \"echo\", \"arguments\": {\"message\": \"test\"}}]}", "Done."]'

String Mode

For models without native function calling support (e.g., small Ollama models), string mode injects tool descriptions into the system prompt and parses JSON tool calls from the response text.

Enable String Mode

yaml
# In Agent CRD
spec:
  config:
    toolCallMode: "string"

Or via environment variable:

bash
export TOOL_CALL_MODE=string

How It Works

  1. Tool definitions are formatted as text descriptions and appended to the system prompt
  2. The model is instructed to respond with {"tool_calls": [...]} JSON when using tools
  3. Response text is parsed for tool call JSON patterns
  4. Detected tool calls are converted to ToolCallPart objects for Pydantic AI processing

Supported Modes

ModeBehavior
autoDefault — uses Pydantic AI native function calling
nativeSame as auto (explicit)
stringText-based tool calling via system prompt injection

Released under the Apache 2.0 License.