Agentic Loop
The agentic loop is the two-phase reasoning mechanism that enables agents to collect tool results and delegation responses, then produce a final streamed response.
How It Works
Auto-Detection
The agent automatically detects tool calling support at initialization:
# Uses litellm's model registry (no HTTP calls needed)
import litellm
supports_native = litellm.supports_function_calling(model="gpt-4o") # True
supports_native = litellm.supports_function_calling(model="ollama/smollm2:135m") # False- Native: OpenAI
toolsAPI parameter, structuredtool_callsin response - String fallback: Tool descriptions in system prompt, JSON parsed from content text
Both modes use the same unified tool call format.
Configuration
The agentic loop is controlled by the max_steps parameter passed to the Agent:
agent = Agent(
name="my-agent",
model_api=model_api,
max_steps=5 # Maximum tool calling iterations
)max_steps
Prevents infinite loops in Phase 1. When reached, returns message:
"Reached maximum reasoning steps (5)"Guidelines:
- Simple queries: 2-3 steps
- Tool-using tasks: 5 steps (default)
- Complex multi-step tasks: 10+ steps
Unified Tool Call Format
Both native and string modes use the same tool_calls array format. Delegation tools are registered with a delegate_to_ prefix.
Tool Call
{"tool_calls": [{"name": "calculator", "arguments": {"expression": "2 + 2"}}]}Multiple Tool Calls (Parallel)
{"tool_calls": [{"name": "search", "arguments": {"q": "test"}}, {"name": "echo", "arguments": {"msg": "hi"}}]}Delegation
{"tool_calls": [{"name": "delegate_to_researcher", "arguments": {"task": "Find information about quantum computing"}}]}Tool Call Extraction
_extract_tool_calls() checks response.tool_calls first (works for both native mode and mock responses), then falls back to content JSON parsing for string mode:
def _extract_tool_calls(self, response):
# Structured tool_calls take priority (native API or mock responses)
if response.tool_calls:
return response.tool_calls
# String mode fallback: parse tool_calls array or single tool from content
if not self._supports_native_tools:
actions = self._parse_action(response.content or "")
return [ToolCall(...) for action in actions]
return []Progress Blocks
During Phase 1, the agent emits progress blocks when starting tool/delegation execution:
{"type": "progress", "step": 1, "action": "tool_call", "target": "calculator"}
{"type": "progress", "step": 2, "action": "delegate", "target": "researcher"}Execution Flow
Tool Execution
- Extract tool calls from response
- Emit progress block
- Log
tool_callevent to memory - Execute tool via MCP client
- Log
tool_resultevent to memory - Add result to conversation
- Continue to next loop iteration
Delegation Execution
- Extract
delegate_to_{name}tool call from response - Emit progress block
- Log
delegation_requestevent to memory - Invoke remote agent via A2A protocol
- Log
delegation_responseevent to memory - Add response to conversation
- Continue to next loop iteration
Final Response (Phase 2)
- When no tool calls detected, exit Phase 1
- Add "provide your final response" prompt
- Call model with streaming enabled
- Stream tokens directly to client
- Log
agent_responseevent to memory
Memory Events
The loop logs events for debugging and verification:
# After tool execution
events = await agent.memory.get_session_events(session_id)
# Events: [user_message, tool_call, tool_result, agent_response]
# After delegation
events = await agent.memory.get_session_events(session_id)
# Events: [user_message, delegation_request, delegation_response, agent_response]Testing with Mock Responses
Set DEBUG_MOCK_RESPONSES environment variable to test loop behavior deterministically.
The tool_calls format works for both native and string mode:
# Test tool calling (tool call → no action → final)
export DEBUG_MOCK_RESPONSES='["{\"tool_calls\": [{\"id\": \"call_1\", \"name\": \"echo\", \"arguments\": {\"text\": \"hello\"}}]}", "No more actions.", "The echo returned: hello"]'
# Test delegation (delegation → no action → final)
export DEBUG_MOCK_RESPONSES='["{\"tool_calls\": [{\"id\": \"call_1\", \"name\": \"delegate_to_researcher\", \"arguments\": {\"task\": \"Find quantum info\"}}]}", "No more actions.", "Based on the research, quantum computing uses qubits."]'
# Simple response (no tools → Phase 2 only)
export DEBUG_MOCK_RESPONSES='["Hello! How can I help you?"]'For Kubernetes E2E tests, configure via the Agent CRD:
spec:
container:
env:
- name: DEBUG_MOCK_RESPONSES
value: '["{\"tool_calls\": [{\"id\": \"call_1\", \"name\": \"delegate_to_worker\", \"arguments\": {\"task\": \"process data\"}}]}", "No more actions.", "Done."]'Best Practices
- Set appropriate max_steps - Too low may truncate reasoning, too high wastes resources
- Clear instructions - Tell the LLM when to use tools vs. respond directly
- Test with mocks - Use
DEBUG_MOCK_RESPONSESwithtool_callsformat - Monitor events - Use memory endpoints to debug complex flows
- Handle errors gracefully - Tool failures are fed back to the loop for recovery