Stop Listing Tools, Start Writing Code: Cloudflare Code Mode Rethinks MCP

Here is a counterintuitive fact about AI agents: the more tools you give one, the dumber it tends to get. Every tool you register goes into the model's context as a description it has to read, hold, and choose between on each step. Wire up a few dozen and the system prompt bloats, latency climbs, and the model starts picking the wrong tool or hallucinating arguments. The standard way we connect agents to the world — one tool per capability — has a ceiling, and serious systems hit it fast.

Cloudflare's answer, shipped in April 2026 as Code Mode, is to stop handing the agent a menu and start handing it a programming language [1]. Instead of exposing 2,500 API endpoints as 2,500 tools, a Code Mode server exposes two: one to search for the operation you need, and one to execute code that calls it. The agent writes JavaScript, runs it in a sandbox, and gets the result. The reported effect on context is not incremental. Interacting with those 2,500-plus endpoints dropped from roughly 1.17 million tokens to about 1,000 — a 99.9% reduction [1]. That number is worth sitting with, because it isn't an optimization. It's a different architecture.

Why listing tools doesn't scale

To understand what Code Mode fixes, look closely at what classic Model Context Protocol usage actually costs.

In the standard pattern, every capability is a tool, and every tool is described to the model up front. Name, purpose, parameters, types — all of it lives in the context window so the model knows what's available. For a handful of tools this is fine. The problem is multiplicative: each new integration adds its full surface to the prompt, whether or not this particular task needs it. Connect an agent to a real enterprise's APIs — a cloud platform's hundreds of operations, a CRM, a data warehouse — and the tool definitions alone can consume more context than the actual work.

The cost isn't only tokens and money, though 1.17 million tokens per interaction is its own catastrophe. It's accuracy. A model choosing from thousands of similar-sounding tools is doing a hard retrieval problem on every step, and it degrades exactly the way you'd expect: wrong tool, wrong arguments, confident mistakes. The conventional fixes — trim the tool list, split into multiple agents, retrieve relevant tools dynamically — all manage the symptom. They accept "one tool per endpoint" as a law of nature and try to keep the list short.

Code Mode rejects the premise.

Search, then execute

The mechanism is simple enough to describe in a sentence and consequential enough to reorganize how you build. The Code Mode server presents the agent with two tools, search() and execute(), sitting in front of an OpenAPI specification that describes the full API [1].

When the agent has a task, it calls search() to find the right operations in the spec — no need to carry 2,500 descriptions in context, because it looks up the two or three it needs on demand. Then it calls execute() with JavaScript it writes itself: code that invokes those operations, handles the responses, loops, branches, and composes a result. The API surface stops being a list the model memorizes and becomes a library the model programs against.

This flips the unit of work. In classic MCP, a multi-step task is a sequence of separate model turns — call a tool, read the result, decide the next tool, call it, and so on, each round trip burning a model call and a slice of context. In Code Mode, the agent writes one program that does the whole sequence and runs it once. Filtering a list, transforming the results, and calling a second endpoint with them isn't three tool-calls-with-reasoning-between-them; it's a few lines of JavaScript. The model does what models are genuinely good at — writing code — instead of impersonating a control-flow engine one tentative step at a time.

That's also why the token savings are so steep. The 99.9% figure isn't from compressing tool descriptions. It's from never loading most of them, and from collapsing many reasoning-heavy turns into a single executed program.

The obvious objection: you just gave the agent an interpreter

If the previous breath made you nervous, good instincts. Letting an AI agent write and run arbitrary code is exactly the pattern that, done carelessly, turns prompt injection into remote code execution. An agent that executes generated JavaScript is one bad input away from executing an attacker's JavaScript. The power and the danger are the same feature.

Code Mode's answer is the part that makes it shippable rather than reckless: the code runs in a sandboxed V8 isolate with the walls already built [1]. No filesystem access. No exposed environment variables. Controlled egress, so the code can't reach arbitrary network destinations. The isolate is a real boundary enforced by the runtime, not a politely worded instruction to the model. The agent gets a full programming language and a tightly bounded room to run it in.

This is the right trade, and it's worth naming why. The alternative — hundreds of narrow, individually-permissioned tools — feels safer because each tool is small, but it scales badly and pushes the security burden onto the model's tool choices, which can be manipulated. A strong sandbox moves the boundary off the model's judgment and onto deterministic infrastructure that injection can't talk its way past. You still have to respect the egress and capability limits of whatever the code can reach. But "constrain a sandbox" is a known security problem with known tools. "Hope the model never picks the dangerous tool" is not.

The honest caveat is that the sandbox is now load-bearing. Its isolation is the thing standing between a clever prompt and your systems, so it has to actually hold. Code Mode is only as safe as the walls of that V8 isolate — which is a much better place to concentrate your security attention than spread across a thousand tool definitions.

A piece of a bigger shift

Code Mode reads as a clever Cloudflare feature. Step back and it's a local instance of the dominant pattern in agent design right now.

Across the spring of 2026, the major labs independently moved orchestration into generated code. Microsoft's CodeAct has the model emit a Python program that calls tools instead of chaining tool calls. OpenAI shipped code mode for its Agents SDK. LangChain's Deep Agents added a Code Interpreter to compose tools in code. The industry converged on a single idea: the model is a better programmer than it is a dispatcher, so let it write programs. Cloudflare's Code Mode is that same idea applied at the MCP layer — the protocol that connects agents to external APIs.

That convergence is the signal to pay attention to. When the frontier labs and the infrastructure providers arrive at the same architecture from different directions, it's not a fad. It's the problem's actual shape showing through. The "expose every endpoint as a tool" era is closing, and code-as-orchestration is what replaces it.

For builders, the practical guidance is concrete. When you're integrating an agent against a large API surface — anything past a couple dozen operations — reach for the Code Mode shape: a search-and-execute interface over a spec, with a real sandbox, rather than an ever-growing tool list. The pattern generalizes well beyond Cloudflare's own services; the open-sourced SDK is a starting point, but the architecture is the takeaway. And when you do let an agent run code, put the same care into the sandbox that you'd put into any system boundary that an adversary will eventually test, because they will.

We spent two years teaching agents to pick from menus and quietly accepting that the menu couldn't get too long. Cloudflare's bet is that we had it backwards the whole time — that the way to give an agent a thousand capabilities was never to describe a thousand tools, but to hand it a language and a locked room and let it write. The token math says the bet is sound. The rest of the industry, arriving at the same answer in the same season, says it isn't a bet at all.

References

[1] Leela Kumili, InfoQ — Cloudflare Launches Code Mode MCP Server to Optimize Token Usage for AI Agents. Article