Google ADK - Runner and execution architecture
You have defined your agents, wired up tools, set up callbacks, and configured sessions with scoped state. But when a user sends a message, what actually happens? Who calls the agent? Who persists in the state changes? Who decides when the conversation turn is over? The answer to all of these is the Runner. The Runner is the central orchestrator of the ADK runtime. It receives a user’s query, starts the agent, processes every event the agent emits, commits state changes via the SessionService, and forwards events to the caller. Without the Runner, your agents, tools, and callbacks are just definitions sitting idle. The Runner is the engine that brings them to life.
The event loop
An event loop is at the heart of the ADK runtime and facilitates the communication between the Runner component and the agent execution. When a user prompt arrives, the Runner hands it over to the agent for processing. The agent runs until it has something to yield, at which point it emits an event. The Runner receives the event, processes any associated actions, calls the session service to append the event to the current state, and forwards the event. After the Runner completes event processing, the agent resumes from where it was paused and continues this loop until it has no more events to yield. The Runner component is the central orchestrator of this event loop.
Several components work together within the ADK runtime. Understanding their roles clarifies how the event loop functions.
The Runner
The Runner serves as the central coordinator for a single-user invocation. Its responsibilities in the loop are:
- Initiation: Receives the user’s query (
new_message) and appends it to the session history via theSessionService. - Kick-off: Starts event generation by calling the main agent’s execution method (
agent.run_async(...)). - Receive and process: Waits for the agent logic to yield an
Event. Upon receiving one, it uses configured services (SessionService,ArtifactService,MemoryService) to commit changes indicated inevent.actions(such asstate_deltaandartifact_delta). - Yield upstream: Forwards the processed event onwards to the calling application or UI for rendering.
- Iterate: Signals the agent logic that processing is complete, allowing it to resume and generate the next event.
The execution logic
Your code within agents, tools, and callbacks is responsible for the actual computation and decision-making. Its interaction with the loop follows a specific pattern:
- Execute: Runs its logic based on the current
InvocationContext, including the session state as it was when execution resumed. - Yield: When the logic needs to communicate, it constructs an
Eventcontaining the relevant content and actions, then yields it back to theRunner. - Pause: The agent’s execution pauses immediately after the yield. It waits for the
Runnerto complete processing and committing. - Resume: Only after the
Runnerhas processed the yielded event does the agent logic resume from the statement immediately following the yield. - See updated state: Upon resumption, the agent logic can now reliably access the session state reflecting the changes that were committed by the
Runner.
This cooperative yield/pause/resume cycle between the Runner and your execution logic, mediated by Event objects, forms the core of the ADK runtime. When the Runner starts processing a user query, it creates an InvocationContext. This is the runtime’s “traveling notebook” that accompanies the interaction from start to finish, collecting information, tracking progress, and providing context to every component along the way. You do not typically create or manage this object directly. The ADK framework creates it when an invocation starts via runner.run_async and passes the relevant contextual information to your agent code, callbacks, and tools. When you implement custom agents, you receive them as the ctx parameter in _run_async_impl.
|
|
It is important to understand the hierarchy of concepts within an invocation. An invocation starts with a user message and ends with a final response. It can contain one or multiple agent calls, for example, when using agent transfer or AgentTool. Each agent call is handled by agent.run_async(). An LLM agent call can contain one or multiple steps. Each step calls the LLM once and yields its response. If the LLM requests tool calls, those are executed within the same step.
State variables prefixed with temp: are strictly scoped to a single invocation and discarded afterwards. When a parent agent calls a sub-agent, it passes its InvocationContext to the sub-agent. This means the entire chain of agent calls shares the same invocation ID and the same temp: state.
Creating and using a Runner
To create a Runner, you need an agent and a SessionService. Optionally, you can provide an ArtifactService and a MemoryService.
|
|
Once you have a runner, you interact with it using one of its run methods.
run_async
This is the primary method for executing agent invocations. It returns an async generator of events. The ADK runtime is fundamentally built on asynchronous patterns using Python’s asyncio to handle concurrent operations like waiting for LLM responses or tool executions efficiently without blocking.
|
|
run (synchronous)
A synchronous Runner.run method exists for convenience in simple scripts or testing environments. Internally, it calls Runner.run_async and manages the async event loop execution for you.
|
|
For production applications, especially web servers, we should design applications to be asynchronous using run_async for best performance.
run_live
For bidirectional streaming scenarios, such as voice conversations, the Runner provides run_live. This method uses a LiveRequestQueue for sending messages and returns an async generator of events. Unlike run_async, which handles a single request-response cycle, run_live maintains a persistent streaming connection to the LLM.
|
|
One InvocationContext corresponds to one run_live() loop. It is created when you call run_live() and persists for the entire streaming session.
RunConfig
The RunConfig class defines runtime behavior and options for agents. It controls streaming settings, function calling, artifact saving, and LLM call limits. You pass a RunConfig to customize how the runner executes your agent.
|
|
Some of the key properties of the RunConfig class are:
| Property | Type | Default | Purpose |
|---|---|---|---|
streaming_mode |
StreamingMode |
StreamingMode.NONE |
Controls output delivery: NONE, SSE, or BIDI |
max_llm_calls |
int |
500 |
Safety limit on total LLM calls per invocation |
save_input_blobs_as_artifacts |
bool |
False |
Whether to save input binary data as artifacts |
support_cfc |
bool |
False |
Enables Compositional Function Calling |
speech_config |
SpeechConfig |
None |
Voice configuration for live/audio agents |
response_modalities |
list[str] |
None |
Controls output format: ["TEXT"] or ["AUDIO"] |
StreamingMode
The streaming_mode setting determines how the agent’s responses are delivered.
StreamingMode.NONEis the default. The LLM generates its entire response before delivering it. The Runner receives a single non-partial event for the response.StreamingMode.SSE(Server-Sent Events) uses HTTP streaming. The LLM generates its response in chunks. The Runner yields multiple events withpartial=Truefor progressive display, followed by a final non-partial event.StreamingMode.BIDIenables full bidirectional streaming via WebSocket, used withrun_live()for real-time voice and multimodal interactions.
max_llm_calls
The max_llm_calls parameter acts as a safety limit to prevent runaway agent loops. If an agent enters an infinite tool-calling cycle, this limit ensures the invocation terminates after a set number of LLM calls. The default of 500 is generous for most use cases.
Compositional Function Calling
Setting support_cfc=True enables Compositional Function Calling. This allows the model to orchestrate multiple tools in sophisticated patterns, calling tools in parallel, chaining outputs as inputs to other tools, or conditionally executing tools based on intermediate results.
Understanding a few key aspects of how the ADK runtime handles state and streaming is crucial for building predictable agents.
Putting it all together
Here is a complete example that demonstrates the Runner orchestrating a multi-turn conversation with a tool-calling agent. This builds on the sessions and state concepts from the earlier article in this series.
|
|
When you run this, you can observe the Runner orchestrating the full event loop across multiple turns, with the session service persisting state between them.
|
|
Notice how the Runner drives the entire lifecycle. It receives each user message, passes it to the agent, processes every event the agent generates, including tool call events where state is modified via ToolContext, commits the state_delta through the SessionService, and yields the final response events back to our code. The user:query_count counter increases across turns because the user-scoped state is persisted by the session service between invocations. The agent’s instruction template {user:query_count?} is resolved by the framework on each turn using the latest committed state.
Understanding the Runner and its execution architecture is essential for building reliable agent applications with Google ADK. The Runner is the piece that connects everything we have covered so far in this series, from sessions and state to callbacks and tools. Every state change flows through events, every event flows through the Runner, and the Runner ensures consistency through the yield/pause/process/resume cycle. I recommend experimenting with different RunConfig settings and tracing the events your agents emit to build a deeper understanding of this architecture.
Comments
Comments Require Consent
The comment system (Giscus) uses GitHub and may set authentication cookies. Enable comments to join the discussion.