AdvancedLast updated: January 29, 2026
Multi-Agent Coordination
When multiple agents work on related tasks, coordination becomes the bottleneck. The right model depends on task dependencies, communication costs, and failure tolerance.
Dependency Types
| Type | Description | Agent Example |
|---|---|---|
| Finish-to-Start (FS) | B can't start until A finishes | Research agent → Writing agent |
| Start-to-Start (SS) | B can start when A starts | Parallel research on related topics |
| Finish-to-Finish (FF) | B can't finish until A finishes | Review agent waits for all drafts |
| Start-to-Finish (SF) | B can't finish until A starts | Rare; handoff timing scenarios |
For AI agents: Most orchestration uses FS dependencies (sequential) or no dependency (parallel). Complex dependency graphs add coordination overhead — favor simplicity.
Coordination Models
1. Centralized Orchestrator (Recommended)
[Orchestrator]
/ | | \
[A] [B] [C] [D]- Single point of coordination holds global context
- Agents are stateless workers focused on single tasks
- Pros: Clear authority, no agent-to-agent confusion, easier debugging
- Cons: Bottleneck at orchestrator, single point of failure
2. Hierarchical (Complex Projects)
[Orchestrator]
/ \
[Lead A] [Lead B]
/ \ / \
[A1] [A2] [B1] [B2]- Sub-orchestrators manage workstreams
- Top orchestrator only coordinates between workstreams
- When to use: 5+ agents, multiple independent workstreams
3. Peer-to-Peer (Rare for CLI agents)
[A] ←→ [B]
↕ ↕
[C] ←→ [D]- Agents communicate directly with each other
- No central coordinator
- Caution: Hard to debug, easy to deadlock. Avoid for most CLI agent deployments.
Parallel Execution Strategies
- Fan-out / Fan-in: Spawn N agents in parallel → collect all results → synthesize
- Pipeline with buffers: Stage 1 produces outputs that Stage 2 consumes as they arrive
- Race pattern: Multiple agents attempt the same task → take the first good result
Communication Cost Formula
Before adding an agent, estimate the communication overhead:
Total cost = Σ(agent execution tokens) + Σ(handoff tokens) + orchestrator tokens
If handoff_tokens > 30% of execution_tokens:
→ Too much coordination overhead
→ Consider merging agents or simplifyingFAQ
How do I handle failures in multi-agent systems?
Use the "fail-fast, report-up" pattern: each agent has a timeout and retry limit. If an agent fails, it reports what it tried and what went wrong to the orchestrator. The orchestrator decides whether to retry, reassign, or abort. Never let agents retry indefinitely — set explicit limits (typically 3 retries max).