How Copilot assembles and processes prompts

tech
prompt-engineering
github-copilot
concepts
Understand the multi-layered prompt assembly architecture that transforms your chat messages into model requests — system prompt layers, user prompt construction, context window growth, and context rot.
Author

Dario Airoldi

Published

March 1, 2026

How Copilot assembles and processes prompts

Every message you type in GitHub Copilot Chat triggers a multi-layered construction process before the model sees a single token. Understanding how this assembly works — and where each customization mechanism lands — is the single most important concept in this series. It’s the difference between guessing which file type to use and knowing exactly why instructions persist across requests while prompt files don’t.

This article explains the prompt assembly architecture from the ground up. You’ll learn how the system prompt is built in layers, how the user prompt is constructed from your message plus environment context, how the context window grows over a conversation, and why that growth eventually degrades the model’s accuracy.

Table of contents


🎯 Why assembly matters

When you type a message in Copilot Chat, you’re not sending that message directly to the model. VS Code intercepts your text and assembles a much larger request that includes your project rules, your agent’s persona, environment details, workspace structure, and conversation history. The model receives all of this as a single input.

This means two things:

  1. Where your content lands determines how the model treats it. Content in the system prompt is treated as the model’s built-in rules. Content in the user prompt is treated as user-provided context. The distinction affects how strongly the model follows your instructions.
  2. Every customization file you create occupies a specific slot in the assembly. Instruction files land in the system prompt. Prompt files land in the user prompt. Agents override the system prompt’s identity layer. If you put content in the wrong file type, it lands in the wrong slot — and the model treats it differently than you intended.

🏗️ The system prompt: identity and rules

The system prompt is assembled first. It defines who the model is, what rules it follows, and what capabilities it has. VS Code builds it in layers, from most general to most specific:

┌─────────────────────────────────────────────────────────────┐
│                    SYSTEM PROMPT                            │
├─────────────────────────────────────────────────────────────┤
│  1. Core identity and global rules                          │
│     "You are an expert AI programming assistant..."         │
│                                                             │
│  2. General instructions                                    │
│     Model-specific behavioral rules and quirks              │
│                                                             │
│  3. Tool use instructions                                   │
│     How to call tools, format parameters, handle results    │
│                                                             │
│  4. Output format instructions                              │
│     Markdown formatting, code block rules, link styles      │
│                                                             │
│  5. Custom instructions (.instructions.md files)            │
│     Your project-specific guidance (auto-injected)          │
│     ⚠️ copilot-instructions.md is always injected LAST      │
│                                                             │
│  6. Custom agent definition (.agent.md body)                │
│     Agent persona, workflow, and constraints                │
│     Only present when a custom agent is active              │
└─────────────────────────────────────────────────────────────┘

Layers you don’t control (1–4)

The first four layers are built-in. VS Code generates them automatically for every request. They establish the model’s base identity, teach it how to call tools, and define output formatting rules. You can’t modify these layers directly, but understanding they exist explains certain model behaviors:

  • Layer 1 sets the model’s persona as a programming assistant. This is why Copilot defaults to code-related answers even when you ask general questions.
  • Layer 2 includes model-specific tweaks. Different models get slightly different behavioral instructions to account for their strengths and quirks.
  • Layer 3 teaches the model how to use tools — the JSON schema for each tool, how to format parameters, what to do with results. This is why tool calls “just work” without you having to explain the mechanics.
  • Layer 4 defines formatting rules — Markdown conventions, code block syntax highlighting, link formats. This is why Copilot’s output is consistently formatted.

Layers you control (5–6)

Layer 5 is where your custom instructions go. If you have multiple .instructions.md files, they’re injected based on their applyTo patterns — only files whose patterns match the current editing context are included. The repository-wide copilot-instructions.md file is always appended last, giving it the final word on project conventions.

Layer 6 is where custom agents inject their persona and workflow. This only happens when a custom agent is active (selected in the agent picker). The agent body acts as a full identity override — it doesn’t just provide information, it tells the model who it is and how it should behave.

Why this layering matters

The system prompt persists across the entire conversation. Every message you send includes the full system prompt. This is why instruction files “feel” persistent — they’re re-injected with every request. It’s also why overloading the system prompt with too many instructions can degrade performance: the model has to process all of it before it even reads your message.


📝 The user prompt: your message in context

The user prompt is assembled separately for each message. It contains everything specific to the current request:

┌─────────────────────────────────────────────────────────────┐
│                     USER PROMPT                             │
├─────────────────────────────────────────────────────────────┤
│  1. Prompt file contents (.prompt.md body)                  │
│     Only present when you invoke a prompt via /command      │
│                                                             │
│  2. Environment info                                        │
│     OS, IDE version, available extensions                   │
│                                                             │
│  3. Workspace info                                          │
│     Project structure, folder layout (text format)          │
│                                                             │
│  4. Context info                                            │
│     Current date/time, open terminals, attached files       │
│                                                             │
│  5. Your message                                            │
│     The actual text you type in the chat input              │
└─────────────────────────────────────────────────────────────┘

The critical distinction: system vs. user

Prompt files inject into the user prompt, NOT the system prompt. This is the most commonly misunderstood aspect of prompt engineering for Copilot. The model sees prompt file content as “the user is asking me to follow these instructions” rather than “these are my built-in rules.”

This distinction has practical consequences:

Behavior System prompt (instructions) User prompt (prompt files)
Persistence Every request in the session Only the request where invoked
Authority Model treats as built-in rules Model treats as user suggestions
Override risk Hard for users to override Can be overridden by newer messages
Token cost Paid on every request Paid once per invocation

Auto-injected context

VS Code automatically populates the environment, workspace, and context sections. This is why the model knows your operating system, can reference your project structure, and understands what file you have open — even though you didn’t mention any of this in your message.

Attached files (via #file:, drag-and-drop, or @workspace references) appear in the context info section. Each attachment adds tokens to the user prompt, so attaching large files increases the cost of that specific request.


📊 The context window: growth and decay

Once the model responds, its output becomes part of the context window — the running conversation history that both you and the model can see:

┌─────────────────────────────────────────────────────────────┐
│                   CONTEXT WINDOW                            │
├─────────────────────────────────────────────────────────────┤
│  System prompt (persists across the session)                │
│  User message #1                                            │
│  Assistant response #1 (+ tool call results)                │
│  User message #2                                            │
│  Assistant response #2 (+ tool call results)                │
│  ...                                                        │
│  ⚠️ As this grows, earlier content loses influence          │
└─────────────────────────────────────────────────────────────┘

How the window grows

Every exchange adds content to the context window:

  • Your message — the user prompt (including any prompt file body, environment info, and attached files)
  • The model’s response — the generated text
  • Tool call results — when the model calls tools (reading files, running searches, executing commands), the results are appended to the context

Tool calls are particularly expensive. A single read_file call might add hundreds of lines to the context. A grep_search might return dozens of matches. An agent that makes ten tool calls in a single response can easily consume thousands of tokens of context window space.

Why context window size matters

Every model has a maximum context window — the total number of tokens it can process in a single request. As of early 2026:

Model Context window
GPT-4o 128K tokens
Claude Sonnet 4 / Opus 4.6 200K tokens
o3 / o4-mini 200K tokens
Gemini 2.0 Flash 1M+ tokens
GPT-5 1M+ tokens

These numbers seem enormous, but a complex agent session with many tool calls can consume 50K–100K tokens in just a few exchanges. The system prompt alone might be 5K–10K tokens when you have multiple instruction files, an agent definition, and tool schemas.


⚠️ Context rot: the silent accuracy killer

As the context window fills, a phenomenon called context rot degrades the model’s accuracy. Research by Liu et al. (2023, “Lost in the Middle”) demonstrated that language models pay disproportionate attention to content at the beginning and end of their context window, while under-weighting content in the middle.

What context rot looks like in practice

  • The model “forgets” instructions you gave three exchanges ago
  • Tool call results from early in the conversation stop influencing decisions
  • The model starts contradicting its own earlier responses
  • Code generation quality drops as the conversation lengthens

Why it happens

Context rot isn’t a bug — it’s a fundamental property of how transformer attention mechanisms work. The attention pattern looks roughly like a U-curve:

Attention
  ▲
  │   ████                                          ████
  │   ████                                          ████
  │   ████  ███                                ███  ████
  │   ████  ███  ██                        ██  ███  ████
  │   ████  ███  ██  █    ·  ·  ·       █  ██  ███  ████
  └───────────────────────────────────────────────────► Position
      Start            Middle               End

Content at the start (your system prompt, early instructions) and content at the end (your most recent message) get the most attention. Content in the middle — like that detailed code review from five exchanges ago — gradually loses influence.

Mitigation strategies

  1. Start new sessions frequently. Don’t try to do everything in one conversation. When you notice quality dropping, start fresh.
  2. Front-load important instructions. Put critical rules in instruction files (system prompt position) rather than in chat messages (which drift toward the middle over time).
  3. Use subagents for isolated tasks. Each subagent gets a fresh context window. The main agent receives only a summary, keeping its context lean. See Understanding agents, invocation, handoffs, and subagents for the conceptual foundation and How to design subagent orchestrations for practical implementation.
  4. Keep prompt files concise. Every token in a prompt file is a token that pushes other content toward the under-weighted middle.
  5. Monitor token usage. VS Code’s output channel shows token counts per request. Watch for sessions that consistently approach the context window limit.

For a deeper treatment of context management techniques, see How to manage information flow during prompt orchestrations. For token optimization strategies, see How to optimize token consumption during prompt orchestrations.


✅ Choosing the right mechanism

Understanding the assembly architecture helps you choose the right customization mechanism for any situation. The question isn’t “which file type should I create?” — it’s “where in the assembly do I need this content to land?”

If you want to… Use Assembly position Why
Set persistent project rules Custom instructions (.instructions.md) System prompt, layer 5 Injected into every request automatically
Run a reusable workflow Prompt file (.prompt.md) User prompt, on demand Injected only when you invoke the /command
Give the model a new identity Custom agent (.agent.md) System prompt, layer 6 Overrides the identity layer completely
Route to a specific model Prompt file (model: field) or agent (model: field) Request metadata Model routing without changing behavior rules
Add portable, cross-platform capabilities Skill (SKILL.md) System prompt (on match) Auto-loaded when prompt matches description
Enforce policies deterministically Hook (.github/hooks/*.json) Outside the prompt entirely Runs your code, not the model’s interpretation
Extend Copilot with external tools MCP server (mcp.json) Tool schemas in system prompt Adds new tools the model can call at runtime

The decision flowchart

Does the content need to persist across every request?
├─ YES → Is it project-wide or file-specific?
│        ├─ Project-wide → copilot-instructions.md
│        └─ File-specific → .instructions.md with applyTo
├─ NO → Is it a reusable workflow?
│       ├─ YES → Does it need a specific persona?
│       │        ├─ YES → .agent.md
│       │        └─ NO → .prompt.md
│       └─ NO → Is it deterministic enforcement?
│               ├─ YES → Hook (.json)
│               └─ NO → Is it an external integration?
│                       ├─ YES → MCP server
│                       └─ NO → Inline chat message

🎯 Conclusion

The prompt assembly architecture is the conceptual backbone of everything in this series. The system prompt carries your persistent rules and agent identity. The user prompt carries your one-time workflows and attached context. The context window accumulates everything — and as it grows, earlier content loses influence through context rot.

Every customization mechanism you’ll learn about in subsequent articles — prompt files, instruction files, agents, skills, hooks, MCP servers — maps to a specific position in this assembly. When you understand where each mechanism lands and why, choosing the right tool for each situation becomes straightforward.

Key takeaways

  • The system prompt is built in 6 layers: 4 built-in layers you don’t control, plus your custom instructions (layer 5) and agent definition (layer 6)
  • The user prompt contains your message, prompt file content, and auto-injected environment/workspace context
  • Prompt files inject into the user prompt, not the system prompt — they’re treated as user suggestions, not built-in rules
  • The context window grows with every exchange, consuming tokens from tool calls, responses, and attached files
  • Context rot degrades accuracy as the window fills — mitigate by starting new sessions, using subagents, and front-loading critical instructions

Next steps


📚 References

VS Code Copilot Customization Overview [📘 Official] Microsoft’s comprehensive guide to customizing GitHub Copilot in VS Code. Covers custom agents, instructions, prompt files, and MCP configuration. The authoritative source for understanding how customization files are loaded and assembled.

Lost in the Middle: How Language Models Use Long Contexts [📗 Verified Community] Academic research (Liu et al., 2023, TACL) documenting the U-shaped attention pattern in transformer models. Demonstrates that models under-weight middle content in long contexts. Foundational research for understanding context rot and why instruction placement matters.

GitHub Copilot Documentation — Repository Instructions [📘 Official] Official GitHub documentation for customizing Copilot with repository-level instructions. Covers prompt files, instruction files, and agent configuration. Essential for understanding the specification behind the assembly architecture.

VS Code v1.107 Release Notes [📘 Official] December 2024 release introducing Agent HQ, background agents, MCP 1.0, and the Language Models Editor. Relevant context for understanding how the assembly architecture extends to background and cloud execution contexts.