Appendix: Multi-agent orchestration — implementation details and common mistakes

tech
github-copilot
prompt-engineering
agents
orchestration
Companion appendix to the multi-agent orchestration article: detailed case study implementation and 9 common mistakes with examples and solutions
Author

Dario Airoldi

Published

December 26, 2025

Appendix: Multi-agent orchestration — implementation details and common mistakes

Parent article: This appendix provides extended reference material for How to create a prompt orchestrating multiple agents. Read the main article first for conceptual foundations.

Table of contents


🔍 Real Implementation Case Study

The patterns in the main article were battle-tested by designing and implementing a complete prompt/agent creation system. The case study documents:

What Was Built

  • 4 Orchestration Prompts:
    • prompt-design-and-create.prompt.md - Creates new prompts/agents
    • prompt-review-and-validate.prompt.md - Improves existing prompts/agents
    • agent-design-and-create.prompt.md - Creates new agents specifically
    • agent-review-and-validate.prompt.md - Improves existing agents
  • 8 Specialized Agents (4 for prompts, 4 for agents):
    • Researchers (read-only pattern discovery)
    • Builders (file creation)
    • Validators (quality assurance)
    • Updaters (targeted fixes)

Lessons Learned

Challenge Solution Applied
Goals seemed clear but failed edge cases Use Case Challenge methodology (3/5/7 scenarios)
Tool count exceeded 7 in initial designs Tool Composition Validation with decomposition
Validation/fix loops didn’t converge Maximum 3 iterations before escalating
Agents created with wrong mode/tool combos Agent/tool alignment validation (plan + write = FAIL)
Recursive agent creation caused complexity explosion Maximum recursion depth: 2 levels
File changes without user awareness Explicit approval checkpoints at each phase

Success Metrics Achieved

  • Reliability: 100% of created files pass validation within 3 iterations
  • Quality: All agents have complete three-tier boundaries
  • Workflow: Clear approval checkpoints prevent unwanted changes
  • Reusability: Agents used across multiple orchestrators

Full Case Study

For complete specifications, implementation roadmap, and detailed agent designs, see:

📖 Prompt Creation Multi-Agent Flow - Implementation Plan [📎 Internal]

This companion document includes:

  • Complete YAML specifications for all 4 orchestrators
  • Detailed agent specifications with boundaries
  • 2-week staged implementation roadmap
  • Success criteria and validation checklists

⚠️ Common Mistakes in Multi-Agent Orchestration

Orchestrating multiple agents introduces complexity beyond single-agent workflows. These mistakes can cause coordination failures, context loss, or frustrating user experiences.

Mistake 1: Unclear Orchestration Responsibility

Problem: Confusion about whether the prompt or the agent controls workflow progression.

❌ Bad example:

## orchestrator.prompt.md
---
agent: planner
---

Plan the implementation, then hand off to @developer.
## planner.agent.md
---
handoffs:
  - label: "Implement"
    agent: developer
---

Create a plan. Then hand off to developer agent.

Problem: Both prompt AND agent try to control handoff → duplicate instructions, unclear flow.

✅ Solution: Orchestrator prompt controls flow, agents execute:

## orchestrator.prompt.md
---
name: feature-workflow
agent: planner
---

## Multi-Phase Feature Implementation

### Phase 1: Planning (@planner)

Create detailed implementation plan.

**When plan is ready:** Hand off to @developer with:
"Implement the plan above, following all specified requirements."

### Phase 2: Implementation (@developer)

[Developer implements]

**When code is ready:** Hand off to @test-specialist with:
"Generate comprehensive tests for this implementation."

### Phase 3: Testing (@test-specialist)

[Tester creates tests]

**Final step:** Hand off to @code-reviewer for merge readiness assessment.
## planner.agent.md
---
handoffs:
  - label: "Start Implementation"
    agent: developer
    send: false
---

Create implementation plans. The orchestrator prompt controls when to proceed.

Mistake 2: Context Loss Between Handoffs

Problem: Critical information from earlier phases gets lost when transitioning between agents.

❌ Bad example:

handoffs:
  - label: "Test"
    agent: test-specialist
    prompt: "Write tests"

Problem: Test specialist doesn’t know:

  • What was implemented
  • What requirements it must satisfy
  • What edge cases to cover

✅ Solution: Explicit context carryover in handoff prompts:

handoffs:
  - label: "Generate Tests"
    agent: test-specialist
    prompt: |
      Create comprehensive tests for the implementation above.
      
      **Requirements to validate:**
      ${requirements}
      
      **Edge cases identified during planning:**
      ${edgeCases}
      
      **Expected behavior:**
      ${expectedBehavior}
      
      Generate tests that verify all requirements and edge cases.

Mistake 3: Too Many Sequential Handoffs

Problem: Creating workflows with 6+ sequential handoffs creates slow, fragile chains.

❌ Bad example:

Phase 1: Requirements Analyst

Phase 2: Technical Architect

Phase 3: Database Designer

Phase 4: API Designer

Phase 5: Frontend Developer

Phase 6: Test Engineer

Phase 7: Security Auditor

Phase 8: Performance Optimizer

Phase 9: Documentation Writer

Problems:

  • 9 context switches
  • High chance of failure at any step
  • Context dilution
  • Slow execution

✅ Solution: Consolidate related phases, parallelize when possible:

Phase 1: Planning Agent
  - Requirements analysis
  - Technical architecture
  - Database and API design (combined)

Phase 2: Implementation Agent
  - Frontend and backend together
  - Security patterns included

Phase 3: Quality Agent
  - Testing + performance review (combined)
  - Documentation generation

Reduced to 3 phases, each handling related concerns.

Mistake 4: No Failure Recovery Strategy

Problem: Workflow has no path forward when an agent can’t complete its task.

❌ Bad example:

handoffs:
  - label: "Deploy"
    agent: deployment-agent
    send: true

What happens if deployment fails? No fallback, no alternative path.

✅ Solution: Include fallback handoffs or user intervention points:

### Phase 4: Deployment (@deployment-agent)

Deploy to staging environment.

**On success:** Hand off to @smoke-tester for validation.

**On failure:** Hand off to @troubleshooter with:
"Deployment failed with error: [error details]. Diagnose and suggest fixes."

**If troubleshooting doesn't resolve:** Return control to user for manual intervention.
## deployment-agent.agent.md
---
handoffs:
  - label: "Validate Deployment"
    agent: smoke-tester
    prompt: "Run smoke tests on staging deployment"
  - label: "Troubleshoot Failure"
    agent: troubleshooter
    prompt: "Diagnose deployment failure: ${error}"
---

Mistake 5: Mixing Orchestration Levels

Problem: Creating orchestrators that call other orchestrators, creating confusing nested workflows.

❌ Bad example:

## master-orchestrator.prompt.md
agent: planning-orchestrator  # Calls another orchestrator

## planning-orchestrator.agent.md
handoffs:
  - agent: implementation-orchestrator  # Calls yet another orchestrator

## implementation-orchestrator.agent.md
handoffs:
  - agent: developer  # Finally, a real agent

Result: 3 levels of orchestration before actual work starts.

✅ Solution: One orchestrator, specialized agents:

## feature-workflow.prompt.md (SINGLE orchestrator)
agent: planner

Phase 1: @planner - Create plan

Phase 2: @developer - Implement

Phase 3: @tester - Test

Phase 4: @reviewer - Review

Rule: One orchestrator (prompt file) → Multiple execution agents (agent files)

Mistake 6: Hardcoded Agent Names

Problem: Orchestrator references specific agent names that might not exist in all projects.

❌ Bad example:

Hand off to @react-specialist for component implementation.

Problem: Project might use Vue, or might not have @react-specialist agent defined.

✅ Solution: Document required agents, provide defaults:

---
name: component-workflow
description: "Multi-phase component implementation"
---

## Component Implementation Workflow

**Required agents:**
- `@designer` or `@planner` - Creates component specification
- `@developer` - Implements component (any framework)
- `@test-specialist` - Generates tests

**If custom agents don't exist:** Falls back to `@agent` mode.

### Phase 1: Design (@designer)

Create component specification...

**Handoff:** @developer (or default @agent if @developer doesn't exist)

Mistake 7: No User Checkpoint Before Destructive Operations

Problem: Orchestrator auto-progresses through destructive or irreversible steps.

❌ Bad example:

Phase 3: @deployment-agent
  send: true  # Automatically deploys without user review

✅ Solution: Explicit checkpoints before high-risk operations:

### Phase 3: Deployment Preparation (@deployment-agent)

Prepare deployment artifacts and configuration.

**USER CHECKPOINT:**

⚠️ **Review the deployment plan above before proceeding.**

Verify:
- [ ] Target environment is correct
- [ ] Database migrations are safe
- [ ] Rollback plan is clear

**When ready to deploy:** Click "Deploy to Production" handoff below.

handoffs:
  - label: "Deploy to Production"
    agent: deployment-agent
    prompt: "Execute deployment with user approval"
    send: false  # User must click to proceed

Mistake 8: Poor Handoff Prompt Quality

Problem: Handoff prompts are too generic, providing insufficient context for next agent.

❌ Bad examples:

prompt: "Continue"            # What should agent continue?
prompt: "Do your thing"       # What thing?
prompt: "Next phase"          # What is the next phase?

✅ Solution: Specific, actionable handoff prompts:

handoffs:
  - label: "Generate Tests"
    prompt: |
      Create comprehensive test suite for the ${componentName} component implemented above.
      
      **Test coverage requirements:**
      - Unit tests for all public methods
      - Integration tests for component interactions
      - Edge cases: null inputs, empty arrays, error conditions
      
      **Testing framework:** Jest with React Testing Library
      
      **Success criteria:** 80%+ code coverage, all edge cases handled

Mistake 9: Ignoring Agent Execution Context

Problem: Not considering where agents run (local vs. background vs. cloud).

❌ Bad example (assumes all agents run locally):

Phase 1: @planner (local)
Phase 2: @developer (background)  # Has isolated work tree
Phase 3: @reviewer (local)         # Can't see background changes!

Problem: Background agents work in isolated work trees; local agents can’t see their changes until merged.

✅ Solution: Match workflow to execution context:

## Option 1: All local (synchronous, visible)
Phase 1: @planner (local)
Phase 2: @developer (local)
Phase 3: @reviewer (local)

## Option 2: Background with merge points
Phase 1: @planner (local)
Phase 2: @developer (background - creates PR)
  → USER reviews PR
Phase 3: @reviewer (local - after PR merged)

## Option 3: All cloud (GitHub PR workflow)
Phase 1: @pr-analyzer (cloud)
Phase 2: @security-scanner (cloud)
Phase 3: @approval-agent (cloud)

Key Takeaways

DO:

  • Let orchestrator prompt control workflow; agents execute tasks
  • Include explicit context in handoff prompts
  • Consolidate related phases (aim for 3-5, not 9+)
  • Provide fallback paths for failures
  • Use single orchestrator layer, not nested orchestrators
  • Document required agents with fallback options
  • Add user checkpoints before destructive operations
  • Write specific, actionable handoff prompts
  • Match workflow design to agent execution contexts

DON’T:

  • Split control between prompt and agent
  • Lose context between handoffs
  • Create excessively long sequential chains
  • Omit failure recovery strategies
  • Nest orchestrators multiple levels deep
  • Hardcode agent names without fallbacks
  • Auto-progress through high-risk operations
  • Use vague handoff prompts like “continue”
  • Ignore execution context differences

By avoiding these orchestration mistakes, you’ll build robust multi-agent workflows that handle complexity gracefully, recover from failures, and provide clear user control points.