Docker cagent: Build Entire AI Agent Teams in a Single YAML File

Two hundred lines of Python orchestration code. Forty-seven dependencies. A weekend lost to debugging async callback chains between a researcher agent and a writer agent that refused to share context. Or: 12 lines of YAML and docker agent run. That is the gap Docker cagent closes — and the gap tells you something important about where AI agent development is heading.

Docker cagent (now called Docker Agent as of Desktop 4.63+) is an open-source, declarative runtime that lets you define AI agents in YAML and run them from the command line [1]. No framework classes. No orchestration boilerplate. You declare what each agent does, which tools it can access, and how agents delegate to each other. The runtime handles the rest.

If that sounds like the same leap Dockerfiles made for infrastructure — from hand-configured servers to declarative container definitions — the parallel is intentional. And it might be just as consequential.

What a cagent.yaml Actually Looks Like

Strip away the marketing and the smallest useful agent is almost comically simple:

agents:
  root:
    model: openai/gpt-5-mini
    description: A helpful AI assistant
    instruction: |
      You are a knowledgeable assistant that helps users
      with various tasks. Be helpful, accurate, and concise.
    toolsets:
      - type: mcp
        ref: docker:duckduckgo

Five fields. The model picks the LLM provider and model. The description tells other agents (and the runtime) what this agent does. The instruction is the system prompt. And toolsets grants capabilities — in this case, web search via a DuckDuckGo MCP server running inside a Docker container [1].

Run it with docker agent run agent.yaml and you have a conversational agent with internet access. No pip install. No virtual environment. No API wrapper library. The runtime resolves the model provider, spins up the MCP tool server, and handles the agentic loop internally.

This minimal example matters because it reveals the design philosophy: declare what, let the runtime handle how. Every agent you build scales up from this skeleton by adding more tools, sharper instructions, or — where it gets interesting — more agents.

The Tool Ecosystem: Built-In, MCP, and Custom

An agent without tools is a chatbot. Docker Agent ships with a set of built-in toolsets that cover the operations most agents need [1]:

filesystem — read, write, list, search, and navigate files and directories
shell — execute arbitrary commands in the host environment
think — a step-by-step reasoning scratchpad for planning before acting
todo — task list management for multi-step workflows
memory — persistent key-value storage backed by SQLite
fetch — HTTP requests to external APIs

Each one activates with a single line in the YAML. Give an agent filesystem and shell, and it can write code, run tests, and inspect logs. Give it think and todo, and it plans before it acts. These are not abstractions over Python libraries — they are runtime-managed capabilities with built-in permission checks. By default, Docker Agent asks for user confirmation before executing anything with side effects. The --yolo flag bypasses that, if you trust your agent enough [1].

Beyond built-ins, the real extensibility comes through MCP — the Model Context Protocol. Docker Agent supports three flavors: Docker-hosted MCP servers (containerized and isolated), local stdio servers, and remote SSE/HTTP endpoints [1]. The YAML stays clean regardless:

toolsets:
  - type: mcp
    ref: docker:duckduckgo
  - type: shell
  - type: filesystem

This composability means your agent's capabilities grow by adding lines, not by importing packages and writing glue code.

Multi-Agent Delegation: Where It Gets Serious

A single agent with good tools can handle plenty. But the problems worth automating — code review pipelines, research synthesis, content production — require coordination between specialists. Docker Agent handles this through a delegation model that keeps configuration declarative while enabling sophisticated workflows [2].

Here is a development team defined in one file:

agents:
  root:
    model: anthropic/claude-sonnet-4-0
    description: Technical lead coordinating development
    instruction: |
      You are a technical lead managing a development team.
      Analyze requests and delegate to the right specialist.
      Ensure quality by reviewing results before responding.
    sub_agents: [developer, reviewer, tester]
    toolsets:
      - type: think

  developer:
    model: anthropic/claude-sonnet-4-0
    description: Expert software developer
    instruction: |
      You are an expert developer. Write clean, efficient code
      and follow best practices.
    toolsets:
      - type: filesystem
      - type: shell
      - type: think

  reviewer:
    model: openai/gpt-4o
    description: Code review specialist
    instruction: |
      You review code for quality, security, and maintainability.
      Provide actionable feedback.
    toolsets:
      - type: filesystem

  tester:
    model: openai/gpt-4o
    description: Quality assurance engineer
    instruction: |
      You write tests and ensure software quality.
      Run tests and report results.
    toolsets:
      - type: shell
      - type: todo

The sub_agents field on the root agent is the entire orchestration layer. When a user sends a request, the root agent reads the descriptions of its sub-agents, reasons about which specialist fits the task, and calls transfer_task with a target agent name, a task description, and an expected output format. The sub-agent runs its own agentic loop with its own tools, then returns the result. The root agent reviews it and responds [2].

The five-step flow: user message reaches root, root selects a sub-agent, root calls transfer_task, sub-agent executes independently, results flow back. Unlike other tool calls, transfer_task is auto-approved — no user confirmation needed, because the sub-agent operates within the permissions already defined in the YAML [2].

Notice something else in that config: the team uses mixed models. The developer and lead run on Claude Sonnet for code-heavy reasoning. The reviewer and tester run on GPT-4o, where the cost-performance ratio favors breadth over depth. Docker Agent is provider-agnostic — OpenAI, Anthropic, Gemini, AWS Bedrock, Mistral, xAI, even local models through Docker Model Runner all work interchangeably [1]. You can even define model aliases to make swapping trivial:

models:
  fast:
    provider: openai
    model: gpt-5-mini
    temperature: 0.2
  creative:
    provider: openai
    model: gpt-4o
    temperature: 0.8
  local:
    provider: dmr
    model: ai/qwen3

agents:
  analyst:
    model: fast
  writer:
    model: creative
  helper:
    model: local

Three agents. Three different models optimized for their role. Zero lines of code.

Parallel Execution and Background Agents

Sequential delegation works when each step depends on the last. But research tasks, competitive analysis, multi-file code generation — these benefit from parallelism. Docker Agent supports this through background_agents, a toolset that dispatches work concurrently [2].

The root agent calls run_background_agent for each parallel task, receives a task ID immediately, and can monitor progress with list_background_agents or retrieve results with view_background_agent. The pattern looks like fan-out/fan-in, except you defined it by adding background_agents to the toolset list and writing instructions that tell the root agent when to parallelize [2].

No threading code. No asyncio. No message queues. The runtime manages concurrent execution, and the YAML manages the permissions.

The Counterargument: Declarative Configs Hide Complexity

The obvious objection: abstraction layers that hide complexity make debugging harder. When your 200-line Python orchestrator breaks, you can set a breakpoint and step through the logic. When a declarative YAML agent misbehaves, what do you step through?

This is a legitimate concern, and Docker Agent does not entirely solve it. The think toolset helps — it forces the agent to externalize its reasoning into a scratchpad you can inspect. The todo toolset creates an auditable task list. And because each sub-agent runs its own isolated loop, you can test agents individually before composing them.

But the deeper answer is that declarative systems trade one kind of debugging for another. You stop debugging orchestration logic — the callback chains, the state management, the race conditions — and start debugging agent behavior: instructions that are ambiguous, tool permissions that are too narrow, delegation descriptions that mislead the coordinator. These are problems of specification, not implementation. They require different skills, but they are arguably more tractable. You fix them by editing YAML, not by restructuring code.

The --yolo flag's existence also signals Docker's awareness of the tension. The default behavior — confirm before executing side effects — gives you a manual inspection point at every tool call. You can watch your agent's decisions in real time before granting full autonomy.

Distribution: The OCI Registry Angle

Docker's deeper play goes beyond running agents locally. Docker Agent packages agents into OCI artifacts — the same container registry format Docker images use. Push an agent to a registry, and anyone with access can pull and run it [1]. docker agent run agentcatalog/pirate pulls a pre-built agent from a shared catalog the same way docker run nginx pulls a container image.

This matters for teams. A platform engineer defines a deployment agent with specific shell permissions and approved tools, publishes it to the company registry, and every developer on the team runs the same agent with the same guardrails. Version it. Roll it back. Audit the YAML diff between versions. The entire agent — model selection, instructions, tool access, delegation rules — travels as a single distributable artifact.

Beyond Docker: The Config-as-Agent Pattern

Docker cagent is not the only tool moving in this direction. The pattern of defining agents declaratively — config files instead of code — is emerging across the ecosystem. CrewAI has its YAML-based crew definitions. AutoGen has its JSON agent configs. LangGraph separates graph structure from node implementation.

What Docker brings to this pattern is the infrastructure layer underneath. Containerized tool execution. Registry-based distribution. Permission boundaries enforced by the runtime rather than the developer. These are the same advantages Docker brought to application deployment a decade ago, applied to a new kind of workload.

The shift from programming agents to declaring agents is not about making things easier for beginners. It is about raising the ceiling for what a single developer can coordinate. When defining a new specialist agent costs three lines of YAML instead of a new Python class with tool bindings and error handling, you build teams of ten agents instead of settling for two. When swapping a model costs changing one field instead of refactoring an API client, you experiment with model-per-task optimization instead of running everything through the same provider.

Docker cagent bets that the Dockerfile moment for AI agents is now: the point where the industry standardizes on declaring intent and lets the runtime handle execution. Whether Docker wins this particular race matters less than the direction. The YAML file is replacing the Python script as the unit of agent development. And every framework that ignores this shift will find itself on the wrong side of a docker agent run.

References

[1] Docker — Build and Distribute AI Agents and Workflows with cagent. Blog

[2] Docker — How to Build a Multi-Agent AI System Fast with cagent. Blog

[3] DZone — Building an AI Agent With Docker Cagent. Article