Multi-agent is the new microservices

We just can’t seem to help ourselves. Our current infatuation with multi-agent systems risks mistaking a useful pattern for an inevitable future, just as we once did with microservices. Remember those? For some good (and bad) reasons, we took workable applications, broke them into a confusing cloud of services, and then built service meshes, tracing stacks, and platform teams just to manage the complexity we’d created. Yes, microservices offered real advantages, as I’ve argued. But also, you don’t need to “run like Google” unless you actually have Google’s problems. (Spoiler alert: You don’t.)

Now we’re about to make the same mistake with AI.

Every agent demo seems to feature a planner agent, a researcher agent, a coder agent, a reviewer agent, and, why not? an agent whose sole job is to feel good about the architecture diagram. This doesn’t mean multi-agent systems are bad; they’re simply prescribed more broadly than is wise, just as we did with microservices.

So when should you embrace a multi-agent approach?

A real pattern, with a hype tax

Even the companies building the frontier models are practically begging developers not to use them promiscuously. In its 2024 guide to building effective agents, Anthropic explicitly recommends finding “the simplest solution possible” and says that might mean not building an agentic system at all. More pointedly, Anthropic says that for many applications, optimizing single LLM calls with retrieval and in-context examples is usually enough. It also warns that frameworks can create layers of abstraction that obscure prompts and responses, make systems harder to debug, and tempt developers to add complexity when a simpler setup would suffice. Santiago Valdarrama put the same idea more bluntly: “Not everything is an agent,” he stresses, and “99% of the time, what you need is regular code.”

That’s not anti-agent. It’s engineering discipline.

OpenAI lands in roughly the same place. Its practical guide recommends maximizing a single agent’s capabilities first because one agent plus tools keeps complexity, evaluation, and maintenance more manageable. It explicitly suggests prompt templates as a way to absorb branching complexity without jumping to a multi-agent framework. Microsoft is similarly blunt: If the use case does not clearly cross security or compliance boundaries, involve multiple teams, or otherwise require architectural separation, start with a single-agent prototype. It even cautions that “planner,” “reviewer,” and “executor” roles do not automatically justify multiple agents, because one agent can often emulate those roles through persona switching, conditional prompting, and tool permissioning. Google, for its part, adds a particularly useful nuance here, warning that the wrong choice between a sub-agent and an agent packaged as a tool can create massive overhead. In other words, sometimes you don’t need another teammate. You need a function with a clean contract.

Microsoft makes one more point that deserves extra attention: Many apparent scale problems stem from retrieval design, not architecture. So, before you add more agents, fix chunking, indexing, reranking, prompt structure, and context selection. That isn’t less ambitious. It is more adult. We learned this the hard way with microservices. Complexity doesn’t vanish when you decompose a system. It relocates. Back then, it moved into the network. Now it threatens to move into hand-offs, prompts, arbitration, and agent state.

Distributed intelligence is still distributed

What could have been one strong model call, retrieval, and a few carefully designed tools can quickly turn into agent routing, context hand-offs, arbitration, permissioning, and observability across a swarm of probabilistic components. That may be worth it when the problem is truly distributed, but often it’s not. Distributed intelligence is still distributed systems, and distributed systems aren’t cheap to build or maintain.

As OpenAI’s evaluation guide warns, triaging and hand-offs in multi-agent systems introduce a new source of nondeterminism. Its Codex documentation says subagents are not automatic and should only be used when you explicitly request parallel agent work, in part because each subagent does its own model and tool work and therefore consumes more tokens than a comparable single-agent run. Microsoft makes the same point in enterprise language: Every agent interaction requires protocol design, error handling, state synchronization, separate prompt engineering, monitoring, debugging, and a broader security surface.

Modularity, yes. But don’t pretend that modularity will be cheap.

This is why I suspect most teams that think they need multiple agents actually have a different problem. Their tools are vague, their retrieval is weak, their permissions are too broad, and their repositories are under-documented. Guess what? Adding more agents doesn’t fix any of that. It exacerbates it. As Anthropic explains, the most successful implementations tend to use simple, composable patterns rather than complex frameworks, and for many applications a single LLM call with retrieval and in-context examples is enough.

This matters even more because AI makes complexity cheap. In the microservices era, a bad architectural idea was at least constrained by the effort required to implement it. In the agent era, the cost of sketching yet another orchestration layer, another specialized persona, another hand-off, or another bit of glue code is collapsing. That can feel liberating even as it destroys our ability to maintain and manage systems over time. As I’ve written, lower production costs don’t automatically translate into higher productivity. They often just make it easier to manufacture fragility at scale.

Earn the extra moving parts

This also brings us back to a point I’ve made for years about hyperscaler architecture. Just because Google, Amazon, Anthropic, or OpenAI do something doesn’t mean you should too, because you don’t have their problems. Anthropic’s research system is impressive precisely because it tackles a hard, open-ended, breadth-first research problem. Anthropic is also candid about the cost. In its data, agents used about four times more tokens than chat interactions, while multi-agent systems used about 15 times more. The company also notes that most coding tasks are not a particularly good fit because they offer fewer truly parallelizable subtasks, and agents are not yet especially good at coordinating with one another in real time.

In other words, even one of the strongest public examples of multi-agent success comes with a warning label attached. It’s not quite “abandon hope, all ye who enter here, but it’s definitely not “do as I’m doing.”

The better question is “What’s the minimum viable autonomy for this job?” Start with a strong model call. If that isn’t enough, add retrieval. Still not enough? Add better tools. If you need iteration, wrap those tools in a single agent loop. If context pollution becomes real, if independent tasks can truly run in parallel, or if specialization materially improves tool choice, then and only then start “earning” your second agent. If you can’t say which of those three problems you are solving, you probably don’t need another agent. Don’t believe me? All of the top purveyors of agent tools (Anthropic, OpenAI, Microsoft, Google) converge on this same counsel.

So yes, multi-agent is the new microservices. That is both a compliment and a warning. Microservices were powerful when you had a problem worth distributing. Multi-agent systems are powerful when you have a problem worth decomposing. Most enterprise teams don’t, at least not yet. Many others never will. Instead, most need one well-instrumented agent, tight permissions, strong evaluations, boring tools, and clear exit conditions. The teams that win with agentic AI won’t be those that reach for the fanciest topology first. Instead, they’ll be disciplined enough to earn every extra moving part and will work hard to avoid additional moving parts for as long as possible. In the enterprise, boring is still what scales.

Go to Source

Author: