Appendix 01: OpenAI Models Prompting Guide Analysis
Appendix 01: OpenAI Models Prompting Guide Analysis
This appendix provides a comprehensive analysis of OpenAIβs official prompting documentation, extracting key techniques, patterns, and recommendations for GPT-4o, GPT-5, and reasoning models (o3, o4-mini).
Guide Version: This analysis is based on OpenAI documentation as of 2026-01-20. Official guides may have been updated since this analysis. Always verify with the official documentation.
Table of Contents
- π Model Overview
- ποΈ Message Roles and Authority Chain
- π Prompt Structure Best Practices
- π― Few-Shot Learning Techniques
- π Formatting with Markdown and XML
- π§ Reasoning Models (o3, o4-mini)
- β‘ GPT-5 Specific Optimizations
- π° Cost Optimization: Prompt Caching
- π§ Practical Examples
- β οΈ Common Pitfalls
- π References
π Model Overview
Model Categories
OpenAI provides two primary model families with distinct characteristics:
| Family | Models | Strengths | Weaknesses |
|---|---|---|---|
| GPT Models | GPT-4o, GPT-4.1, GPT-5, GPT-5.2 | Fast, cost-efficient, highly steerable | Benefit from explicit instructions |
| Reasoning Models | o3, o4-mini, o1 (legacy) | Complex reasoning, planning, accuracy | Slower, more expensive, different prompting |
Model Selection Guide
From OpenAIβs official guidance:
| Priority | Recommended Model |
|---|---|
| Speed and cost | GPT models (GPT-4o, GPT-4o mini) |
| Well-defined tasks | GPT models |
| Accuracy and reliability | o-series models (o3, o4-mini) |
| Complex problem-solving | o-series models |
Key Insight: βMost AI workflows will use a combination of both modelsβo-series for agentic planning and decision-making, GPT series for task execution.β
Default Recommendation
When in doubt, gpt-4.1 offers a solid combination of intelligence, speed, and cost effectiveness.
π Prompt Structure Best Practices
Recommended Sections (in order)
OpenAI recommends structuring developer messages with these sections:
- Identity: Purpose, communication style, high-level goals
- Instructions: Rules, guidance, what to do/not do
- Examples: Sample inputs with desired outputs
- Context: Additional information for the specific request
Template Structure
# Identity
You are a [role] specializing in [domain].
Your communication style is [style].
# Instructions
* [Rule 1]
* [Rule 2]
* [What to do]
* [What not to do]
# Examples
<user_query>
[Example input 1]
</user_query>
<assistant_response>
[Expected output 1]
</assistant_response>
# Context
[Request-specific information - place at END for caching benefits]Key Principles
- Be explicit: State requirements clearly
- Use consistent structure: Same format across all examples
- Order matters: Place static content first, dynamic content last
- Separate concerns: Use clear delimiters between sections
π― Few-Shot Learning Techniques
Few-shot learning lets you steer models toward new tasks by including input/output examples in the prompt.
Best Practices
1. Diverse Examples
Show a range of possible inputs with expected outputs:
# Examples
<product_review id="example-1">
I absolutely love these headphones β sound quality is amazing!
</product_review>
<assistant_response id="example-1">
Positive
</assistant_response>
<product_review id="example-2">
Battery life is okay, but the ear pads feel cheap.
</product_review>
<assistant_response id="example-2">
Neutral
</assistant_response>
<product_review id="example-3">
Terrible customer service, I'll never buy from them again.
</product_review>
<assistant_response id="example-3">
Negative
</assistant_response>2. Consistent Formatting
Ensure all examples follow the same structure:
- Same delimiters (XML tags, headers)
- Same whitespace patterns
- Same output format
3. Example Quantity
- Start with 2-5 examples
- Add more if the model isnβt generalizing correctly
- Too many examples can cause overfitting
4. ID Attributes for Clarity
Use id attributes to link inputs with outputs:
<user_query id="example-1">
How do I declare a string variable?
</user_query>
<assistant_response id="example-1">
var first_name = "Anna";
</assistant_response>π Formatting with Markdown and XML
OpenAI recommends using Markdown and XML tags to help models understand prompt structure.
When to Use Each
| Format | Best For |
|---|---|
| Markdown headers | Section organization, hierarchy |
| Markdown lists | Instructions, enumerated items |
| XML tags | Data boundaries, input/output pairs |
| XML attributes | Metadata, IDs, categorization |
Combined Example
# Identity
You are coding assistant that enforces snake_case in JavaScript.
# Instructions
* Use snake_case for all variable names
* Use `var` for browser compatibility
* Do not include Markdown formatting in responses
# Examples
<user_query>
How do I declare a string variable for a first name?
</user_query>
<assistant_response>
var first_name = "Anna";
</assistant_response>XML for Data Boundaries
When including user-provided content, wrap it in XML to prevent prompt injection:
# Context
<user_document source="uploaded_file.txt">
[User's potentially untrusted content]
</user_document>
Now analyze the document above for security vulnerabilities.π§ Reasoning Models (o3, o4-mini)
Reasoning models are fundamentally different from GPT models and require different prompting strategies.
How Reasoning Works
Reasoning models generate internal chain of thought (reasoning tokens) before producing visible output:
Input Tokens β [Internal Reasoning Tokens] β Output Tokens
(not visible, but billed)
Important: Reasoning tokens are billed as output tokens but not visible in responses.
When to Use Reasoning Models
β Strong Use Cases
| Use Case | Why Reasoning Excels |
|---|---|
| Ambiguous tasks | Handles gaps in instructions, asks clarifying questions |
| Needle in haystack | Finds relevant info in large unstructured data |
| Document relationships | Draws parallels across multiple documents |
| Agentic planning | Creates detailed multi-step solutions |
| Visual reasoning | Grasps complex charts, tables, architectural drawings |
| Code review | Detects subtle issues human reviewers might miss |
| Evaluation/grading | Nuanced judgment of other model outputs |
β Avoid For
- Simple, well-defined tasks (use GPT instead)
- Latency-sensitive applications
- High-volume, low-complexity requests
Prompting Differences
| Aspect | GPT Models | Reasoning Models |
|---|---|---|
| Instruction style | Detailed, step-by-step | High-level goals |
| βThink step by stepβ | Helpful | Unnecessary/harmful |
| Few-shot examples | Often required | Try zero-shot first |
| Specificity | Be explicit about how | Be specific about what |
Key Prompting Rules
From OpenAIβs official guidance:
- Developer messages replace system messages: Use
developerrole for top-level guidance - Keep prompts simple and direct: Models excel with brief, clear instructions
- Avoid chain-of-thought prompts: Models reason internally
- Use delimiters for clarity: Markdown, XML tags help interpretation
- Try zero-shot first: Add few-shot only if needed
- Provide specific guidelines: Explicit constraints produce better results
- Be specific about success criteria: Describe what a good output looks like
- Markdown formatting: Include
Formatting re-enabledto enable markdown output
Reasoning Effort Parameter
Control reasoning depth with the effort parameter:
response = client.responses.create(
model="o4-mini",
reasoning={"effort": "medium"}, # low, medium, or high
input=[{"role": "user", "content": prompt}]
)| Effort | Token Usage | Speed | Best For |
|---|---|---|---|
low |
Minimal | Fastest | Simple decisions |
medium |
Balanced | Moderate | Most tasks (default) |
high |
Maximum | Slowest | Complex problems |
Context Window Management
Reserve sufficient space for reasoning tokens:
response = client.responses.create(
model="o4-mini",
reasoning={"effort": "medium"},
max_output_tokens=30000, # Reserve at least 25K for reasoning + output
input=[{"role": "user", "content": prompt}]
)Recommendation: Start with at least 25,000 tokens reserved for reasoning and outputs.
Handling Incomplete Responses
Check for truncation due to token limits:
if response.status == "incomplete" and response.incomplete_details.reason == "max_output_tokens":
print("Ran out of tokens during reasoning")β‘ GPT-5 Specific Optimizations
GPT-5 is highly steerable and responsive to well-specified prompts.
Key Characteristics
- Very precise instruction following
- Benefits from explicit developer message structure
- Responds well to detailed examples
- Supports vision capabilities
Prompting Best Practices
For Coding Tasks
# Identity
You are a senior Python developer specializing in async programming.
# Instructions
* Use Python 3.11+ syntax
* Include type hints for all function parameters
* Add docstrings to all public functions
* Handle exceptions explicitly
# Constraints
* No external dependencies beyond stdlib
* Maximum function length: 50 lines
# Output Format
Return only the code, no explanations.For Front-End Engineering
# Identity
You are a React component specialist.
# Instructions
* Use functional components with hooks
* Follow accessibility best practices (ARIA)
* Include PropTypes or TypeScript interfaces
* Component should be self-contained
# Examples
[Include 2-3 component examples]For Agentic Tasks
# Identity
You are a task orchestration agent.
# Instructions
* Break down complex tasks into steps
* Identify dependencies between steps
* Assign appropriate tools to each step
* Verify completion before proceeding
# Available Tools
1. search_codebase - Find relevant code
2. edit_file - Modify files
3. run_tests - Execute test suiteπ° Cost Optimization: Prompt Caching
OpenAIβs prompt caching reduces costs and latency for repeated content.
How It Works
Content at the beginning of prompts is cached and reused across requests:
βββββββββββββββββββββββββββββββββββββββ
β Static content (cached) β β Same across requests
βββββββββββββββββββββββββββββββββββββββ€
β Dynamic content (not cached) β β Changes per request
βββββββββββββββββββββββββββββββββββββββ
Optimization Strategy
- Order parameters consistently: Same JSON key order in API requests
- Static content first: Instructions, examples at the beginning
- Dynamic content last: User inputs, context at the end
Example Structure
# Identity
[Static - cached]
# Instructions
[Static - cached]
# Examples
[Static - cached]
# Context
[Dynamic - changes per request - placed LAST]π§ Practical Examples
Example 1: Code Refactoring with o3
from openai import OpenAI
client = OpenAI()
prompt = """
Instructions:
- Given the React component below, change it so that nonfiction books have red text.
- Return only the code in your reply
- Do not include any additional formatting, such as markdown code blocks
- Use four space tabs, no lines exceeding 80 columns
const books = [
{ title: 'Dune', category: 'fiction', id: 1 },
{ title: 'Frankenstein', category: 'fiction', id: 2 },
{ title: 'Moneyball', category: 'nonfiction', id: 3 },
];
export default function BookList() {
const listItems = books.map(book =>
<li>
{book.title}
</li>
);
return (
<ul>{listItems}</ul>
);
}
"""
response = client.responses.create(
model="o3",
reasoning={"effort": "medium"},
input=[{"role": "user", "content": prompt}],
)
print(response.output_text)Example 2: Sentiment Classification with GPT-4o
developer_message = """
# Identity
You are a sentiment classifier for product reviews.
# Instructions
* Output only: Positive, Negative, or Neutral
* No additional formatting or explanation
* Consider context and nuance
# Examples
<review id="1">
I absolutely love this headphones β sound quality is amazing!
</review>
<classification id="1">
Positive
</classification>
<review id="2">
Battery life is okay, but the ear pads feel cheap.
</review>
<classification id="2">
Neutral
</classification>
<review id="3">
Terrible customer service, I'll never buy from them again.
</review>
<classification id="3">
Negative
</classification>
"""
response = client.responses.create(
model="gpt-4o",
input=[
{"role": "developer", "content": developer_message},
{"role": "user", "content": f"<review>{user_review}</review>"}
]
)Example 3: Multi-Model Architecture
# Step 1: Planning with o3
planning_response = client.responses.create(
model="o3",
reasoning={"effort": "high"},
input=[{
"role": "developer",
"content": "You are a task planner. Break down complex requests into actionable steps."
}, {
"role": "user",
"content": "Refactor this codebase to use dependency injection."
}]
)
plan = planning_response.output_text
# Step 2: Execution with GPT-4o (faster, cheaper)
for step in parse_plan(plan):
execution_response = client.responses.create(
model="gpt-4o",
input=[{
"role": "developer",
"content": f"Execute this specific task: {step}"
}, {
"role": "user",
"content": relevant_code
}]
)β οΈ Common Pitfalls
Pitfall 1: Using CoT with Reasoning Models
β Wrong:
Think step by step about how to solve this problem.
First, identify the key variables...β Correct:
Solve this optimization problem. Ensure the solution minimizes cost while meeting all constraints.Pitfall 2: Inconsistent Example Formatting
β Wrong:
Example 1: Input: "hello" -> Output: greeting
Example 2:
Input: goodbye
Result: farewellβ Correct:
<example id="1">
<input>hello</input>
<output>greeting</output>
</example>
<example id="2">
<input>goodbye</input>
<output>farewell</output>
</example>Pitfall 3: Dynamic Content Before Static
β Wrong:
# User Request
{{dynamic_content}}
# Instructions
[Static rules that could be cached]β Correct:
# Instructions
[Static rules - cached]
# User Request
{{dynamic_content}}Pitfall 4: Insufficient Token Reservation for Reasoning
β Wrong:
response = client.responses.create(
model="o3",
max_output_tokens=1000, # Too small for reasoning
...
)β Correct:
response = client.responses.create(
model="o3",
max_output_tokens=30000, # Adequate for reasoning + output
...
)π References
Official Documentation
π OpenAI Prompt Engineering Guide
[π Official]Primary source for GPT model prompting techniques.π OpenAI Reasoning Models Guide
[π Official]Technical documentation for o-series reasoning models.π OpenAI Reasoning Best Practices
[π Official]When and how to use reasoning models effectively.π OpenAI Model Spec
[π Official]Defines model behavior, chain of command, and message priorities.π OpenAI Prompt Caching
[π Official]Cost optimization through content caching.