Appendix: Token optimization implementation patterns and pitfalls
tech
github-copilot
prompt-engineering
optimization
token-efficiency
Appendix covering implementation patterns (validation pipelines, semantic caching, progressive summarization) and common pitfalls (over-caching, key collisions, cache write costs, content ordering) for token optimization in multi-agent workflows
Appendix: Token optimization implementation patterns and pitfalls
Parent article: How to optimize token consumption during prompt orchestrations
π§ Implementation patterns
Pattern 1: Validation pipeline with caching
# validation-pipeline.prompt.md
---
name: validation-pipeline
description: "Multi-validation with deterministic cache checks"
tools: ['check_validation_cache', 'run_grammar_check', 'run_readability_check']
---
## Process
### Step 1: Batch Cache Check (DETERMINISTIC)
For each validation type (grammar, readability, structure, fact-check):
1. Call `check_validation_cache(file, type, days=7)`
2. Record which validations need running
### Step 2: Run Only Missing Validations (AI)
For each validation NOT in cache:
1. Run appropriate validation prompt
2. Store result in metadata cache
### Step 3: Aggregate Results
Combine cached + fresh results into unified report.Pattern 2: Research with semantic caching
# Research pattern with semantic cache
async def research_topic(topic: str, cache: SemanticCache):
# Check semantic cache first
cached = await cache.get_similar(topic)
if cached and cached.similarity > 0.90:
return cached.response
# Cache miss - perform research
results = await perform_research(topic)
# Store for future similar queries
await cache.store(topic, results)
return resultsPattern 3: Progressive summarization handoff
# builder.agent.md
---
name: builder
handoffs:
- label: "Validate Result"
agent: validator
send: false # Don't send full context
prompt: |
**Summary from Builder:**
{{PHASE_SUMMARY}}
**Artifact location:** {{OUTPUT_FILE}}
Validate the created artifact.
---
## Phase Completion Instructions
Before any handoff, produce a PHASE_SUMMARY (max 200 tokens):
1. Decisions made (bullet list)
2. Artifacts created (file paths)
3. Key constraints applied
4. Specific validation needs
Store full details in output file for reference if needed.β οΈ Common pitfalls
Pitfall 1: Over-caching dynamic content
β Wrong: Caching responses that depend on current file state
# DON'T cache file-dependent analyses
cache.store(
"analyze security of auth.py", # Query seems cacheable...
analysis_result # But result depends on file content!
)β Right: Include content hash in cache key
content_hash = hashlib.md5(file_content.encode()).hexdigest()
cache.store(
f"analyze security of auth.py:{content_hash}",
analysis_result
)Pitfall 2: Cache key collisions
β Wrong: Overly broad cache keys
cache.store("validate article", result) # Which article?β Right: Include all relevant context in key
cache.store(f"validate:{file_path}:{validation_type}:{content_hash}", result)Pitfall 3: Ignoring cache write costs
For Anthropic, cache writes cost 25% more than regular input tokens.
β Wrong: Caching tiny prefixes that are rarely reused
β Right: Only cache prefixes that will be reused 3+ times
Break-even calculation (Anthropic):
- Cache write: 1.25Γ base cost
- Cache read: 0.1Γ base cost
To save money: Need 2+ cache hits to offset write cost
1.25 (write) + 0.1 (read) + 0.1 (read) = 1.45
vs. 1.0 + 1.0 + 1.0 = 3.0 without caching
Savings start at 3rd use.
Pitfall 4: Placing dynamic content before static
β Wrong: User input first
## User Request: {{input}}
## Instructions (static)
[These won't be cached because they come after dynamic content]β Right: Static first, dynamic last
## Instructions (static - cached)
[1,000+ tokens of stable content]
## User Request: {{input}}