Appendix: Token optimization implementation patterns and pitfalls

tech

github-copilot

prompt-engineering

optimization

token-efficiency

Appendix covering implementation patterns (validation pipelines, semantic caching, progressive summarization) and common pitfalls (over-caching, key collisions, cache write costs, content ordering) for token optimization in multi-agent workflows

Author

Dario Airoldi

Published

March 7, 2026

Appendix: Token optimization implementation patterns and pitfalls

Parent article: How to optimize token consumption during prompt orchestrations

🔧 Implementation patterns

Pattern 1: Validation pipeline with caching

# validation-pipeline.prompt.md
---
name: validation-pipeline
description: "Multi-validation with deterministic cache checks"
tools: ['check_validation_cache', 'run_grammar_check', 'run_readability_check']
---

## Process

### Step 1: Batch Cache Check (DETERMINISTIC)

For each validation type (grammar, readability, structure, fact-check):
1. Call `check_validation_cache(file, type, days=7)`
2. Record which validations need running

### Step 2: Run Only Missing Validations (AI)

For each validation NOT in cache:
1. Run appropriate validation prompt
2. Store result in metadata cache

### Step 3: Aggregate Results

Combine cached + fresh results into unified report.

Pattern 2: Research with semantic caching

# Research pattern with semantic cache
async def research_topic(topic: str, cache: SemanticCache):
    # Check semantic cache first
    cached = await cache.get_similar(topic)
    if cached and cached.similarity > 0.90:
        return cached.response
    
    # Cache miss - perform research
    results = await perform_research(topic)
    
    # Store for future similar queries
    await cache.store(topic, results)
    
    return results

Pattern 3: Progressive summarization handoff

# builder.agent.md
---
name: builder
handoffs:
  - label: "Validate Result"
    agent: validator
    send: false    # Don't send full context
    prompt: |
      **Summary from Builder:**
      {{PHASE_SUMMARY}}
      
      **Artifact location:** {{OUTPUT_FILE}}
      
      Validate the created artifact.
---

## Phase Completion Instructions

Before any handoff, produce a PHASE_SUMMARY (max 200 tokens):

1. Decisions made (bullet list)
2. Artifacts created (file paths)
3. Key constraints applied
4. Specific validation needs

Store full details in output file for reference if needed.

⚠️ Common pitfalls

Pitfall 1: Over-caching dynamic content

❌ Wrong: Caching responses that depend on current file state

# DON'T cache file-dependent analyses
cache.store(
    "analyze security of auth.py",  # Query seems cacheable...
    analysis_result  # But result depends on file content!
)

✅ Right: Include content hash in cache key

content_hash = hashlib.md5(file_content.encode()).hexdigest()
cache.store(
    f"analyze security of auth.py:{content_hash}",
    analysis_result
)

Pitfall 2: Cache key collisions

❌ Wrong: Overly broad cache keys

cache.store("validate article", result)  # Which article?

✅ Right: Include all relevant context in key

cache.store(f"validate:{file_path}:{validation_type}:{content_hash}", result)

Pitfall 3: Ignoring cache write costs

For Anthropic, cache writes cost 25% more than regular input tokens.

❌ Wrong: Caching tiny prefixes that are rarely reused

✅ Right: Only cache prefixes that will be reused 3+ times

Break-even calculation (Anthropic):
- Cache write: 1.25× base cost
- Cache read: 0.1× base cost

To save money: Need 2+ cache hits to offset write cost
  1.25 (write) + 0.1 (read) + 0.1 (read) = 1.45
  vs. 1.0 + 1.0 + 1.0 = 3.0 without caching

Savings start at 3rd use.

Pitfall 4: Placing dynamic content before static

❌ Wrong: User input first

## User Request: {{input}}

## Instructions (static)
[These won't be cached because they come after dynamic content]

✅ Right: Static first, dynamic last

## Instructions (static - cached)
[1,000+ tokens of stable content]

## User Request: {{input}}