AdvancedLast updated: January 29, 2026

Multi-Agent Coordination

When multiple agents work on related tasks, coordination becomes the bottleneck. The right model depends on task dependencies, communication costs, and failure tolerance.

Dependency Types

Type	Description	Agent Example
Finish-to-Start (FS)	B can't start until A finishes	Research agent → Writing agent
Start-to-Start (SS)	B can start when A starts	Parallel research on related topics
Finish-to-Finish (FF)	B can't finish until A finishes	Review agent waits for all drafts
Start-to-Finish (SF)	B can't finish until A starts	Rare; handoff timing scenarios

For AI agents: Most orchestration uses FS dependencies (sequential) or no dependency (parallel). Complex dependency graphs add coordination overhead — favor simplicity.

Coordination Models

1. Centralized Orchestrator (Recommended)

       [Orchestrator]
      /   |   |   \
   [A]  [B]  [C]  [D]

Single point of coordination holds global context
Agents are stateless workers focused on single tasks
Pros: Clear authority, no agent-to-agent confusion, easier debugging
Cons: Bottleneck at orchestrator, single point of failure

2. Hierarchical (Complex Projects)

       [Orchestrator]
       /           \
   [Lead A]     [Lead B]
   /    \       /    \
 [A1]  [A2]  [B1]  [B2]

Sub-orchestrators manage workstreams
Top orchestrator only coordinates between workstreams
When to use: 5+ agents, multiple independent workstreams

3. Peer-to-Peer (Rare for CLI agents)

   [A] ←→ [B]
    ↕       ↕
   [C] ←→ [D]

Agents communicate directly with each other
No central coordinator
Caution: Hard to debug, easy to deadlock. Avoid for most CLI agent deployments.

Parallel Execution Strategies

Fan-out / Fan-in: Spawn N agents in parallel → collect all results → synthesize
Pipeline with buffers: Stage 1 produces outputs that Stage 2 consumes as they arrive
Race pattern: Multiple agents attempt the same task → take the first good result

Communication Cost Formula

Before adding an agent, estimate the communication overhead:

Total cost = Σ(agent execution tokens) + Σ(handoff tokens) + orchestrator tokens

If handoff_tokens > 30% of execution_tokens:
  → Too much coordination overhead
  → Consider merging agents or simplifying

FAQ

How do I handle failures in multi-agent systems?

Use the "fail-fast, report-up" pattern: each agent has a timeout and retry limit. If an agent fails, it reports what it tried and what went wrong to the orchestrator. The orchestrator decides whether to retry, reassign, or abort. Never let agents retry indefinitely — set explicit limits (typically 3 retries max).