AI Agent Orchestration Patterns: Planner-Executor, Swarm, and What Ships

Most production agents fail at orchestration, not reasoning. The three patterns we deploy — planner-executor, swarm, and pipeline — and when each fits.

AI Agent Orchestration Patterns: Planner-Executor, Swarm, and What Ships

Most agent demos look magical for ten minutes and then break for the next ten. The reasoning isn’t the problem — the orchestration is. Where does the plan live? Who calls tools? What happens when a step fails? Without a clear pattern, every agent quietly turns into a chatbot wrapped around while (not done) { call LLM }, which is exactly the loop you don’t want in production.

We’ve shipped agents into finance, healthcare, and operations workflows. Three patterns cover ~90% of what actually works. Pick one and resist the urge to invent a fourth.

Pattern 1: planner-executor#

One LLM call produces a structured plan: a list of steps, each with a tool, inputs, and expected output. A separate executor runs the plan deterministically — no LLM in the loop unless a step explicitly needs one. If a step fails, the planner is re-invoked with the failure context and asked to produce a new plan from the failure point.

This is our default for workflows where the plan space is bounded and steps are mostly deterministic — back-office tasks, document processing, multi-system updates. Cost is low (one or two planning calls per task), debugging is high (the plan is a readable artifact you can replay), and behavior is predictable (the same input produces the same plan, mostly).

The trap is letting the planner do the work. If your planner is calling tools mid-plan, you’ve reinvented ReAct — see the next section.

Pattern 2: ReAct / swarm#

The agent calls an LLM in a loop. Each iteration: observe the state, choose a tool, call it, append the result to context, repeat until a stop condition. Good for open-ended tasks where the next step depends heavily on the last result — research, debugging, exploratory data work.

The “swarm” variant has multiple specialized agents that hand off to each other (sales-qualifier → researcher → drafter → reviewer). Each agent has a tight tool set and a tight system prompt. Swarms feel elegant. They are also the most expensive pattern by a wide margin and the hardest to debug.

Use ReAct when the problem genuinely requires reactive reasoning. Don’t use it because it’s the default in your framework.

Pattern 3: pipeline (LLM-as-function)#

Not really an agent, but worth naming: a deterministic pipeline of stages, where one or more stages happen to be LLM calls. Extract → classify → enrich → route. Each stage has typed inputs and outputs. Failures retry the stage, not the whole flow. Most “AI features” inside enterprise apps belong here, not in an agent loop.

Pipelines are the most production-grade pattern we deploy. Latency is bounded, cost is predictable, evals are stage-local, and observability is the same observability you already have for any other service. If your “agent” can be expressed as a pipeline, make it a pipeline.

How to choose#

Three questions, in order:

  1. Is the workflow expressible as a fixed graph of stages? Yes → pipeline. Stop.
  2. Is the plan knowable at the start of the task? Yes → planner-executor. Stop.
  3. Does the next step genuinely depend on the previous result in unpredictable ways? Yes → ReAct or swarm. Otherwise, go back to step 1 — you probably can express it as a pipeline.

We’ve seen teams skip step 1 and jump straight to swarm. They burn six months and 100x the inference cost to ship something a pipeline would have done in three weeks.

What goes wrong in production#

Unbounded loops. ReAct agents keep going because no stop condition fires cleanly. Always set max-iterations and max-tokens; treat hitting the cap as a failure to investigate, not a soft success.

Tool sprawl. Twelve tools per agent and the model picks the wrong one 30% of the time. Trim aggressively. Five tools is plenty. If you need more, you have multiple agents.

Context window collapse. Long ReAct traces overflow the context. Summarize old observations into a working memory; don’t dump raw tool output forever.

No eval harness. “It worked in my notebook” is not evidence. Every agent in production needs a golden set of tasks scored deterministically — see our notes on evals and observability.

No cost ceiling. A misbehaving swarm can spend $400 on a single task. Set a per-task budget and abort when crossed.

What we deploy by default#

For new agent engagements via our AI & LLM integration service, our default starting architecture:

  • Pipeline first. If the task fits, ship it as a pipeline.
  • Planner-executor when there’s plan variation but bounded plan space.
  • ReAct/swarm only when the previous two genuinely cannot model the work.
  • Hard caps on iterations, tokens, cost per task. Eval harness from week one. Observability via LangSmith or Helicone.

Agent frameworks come and go. The orchestration pattern decision outlives the framework choice and is far more load-bearing.


If your agent demo is great and your agent in prod is unreliable, it’s an orchestration problem. Our AI team ships production agents into finance, healthcare, and operations workflows. Tell us about the task.