Appendix 01: OpenAI Models Prompting Guide Analysis

Deep analysis of OpenAI’s official prompting guides for GPT-4o, GPT-5, and reasoning models (o3, o4-mini) with extracted techniques and examples

Author

Dario Airoldi

Published

January 20, 2026

Appendix 01: OpenAI Models Prompting Guide Analysis

This appendix provides a comprehensive analysis of OpenAI’s official prompting documentation, extracting key techniques, patterns, and recommendations for GPT-4o, GPT-5, and reasoning models (o3, o4-mini).

Guide Version: This analysis is based on OpenAI documentation as of 2026-01-20. Official guides may have been updated since this analysis. Always verify with the official documentation.

📊 Model Overview
🏗️ Message Roles and Authority Chain
📝 Prompt Structure Best Practices
🎯 Few-Shot Learning Techniques
📐 Formatting with Markdown and XML
🧠 Reasoning Models (o3, o4-mini)
⚡ GPT-5 Specific Optimizations
💰 Cost Optimization: Prompt Caching
🔧 Practical Examples
⚠️ Common Pitfalls
📚 References

📊 Model Overview

Model Categories

OpenAI provides two primary model families with distinct characteristics:

Family	Models	Strengths	Weaknesses
GPT Models	GPT-4o, GPT-4.1, GPT-5, GPT-5.2	Fast, cost-efficient, highly steerable	Benefit from explicit instructions
Reasoning Models	o3, o4-mini, o1 (legacy)	Complex reasoning, planning, accuracy	Slower, more expensive, different prompting

Model Selection Guide

From OpenAI’s official guidance:

Priority	Recommended Model
Speed and cost	GPT models (GPT-4o, GPT-4o mini)
Well-defined tasks	GPT models
Accuracy and reliability	o-series models (o3, o4-mini)
Complex problem-solving	o-series models

Key Insight: “Most AI workflows will use a combination of both models—o-series for agentic planning and decision-making, GPT series for task execution.”

Default Recommendation

When in doubt, gpt-4.1 offers a solid combination of intelligence, speed, and cost effectiveness.

🏗️ Message Roles and Authority Chain

OpenAI models use a chain of command for message priority, defined in the OpenAI Model Spec.

Role Hierarchy

Role	Purpose	Priority
`developer`	System rules and business logic (like a function definition)	Highest
`user`	End-user inputs and configuration (like function arguments)	Lower
`assistant`	Model-generated responses	—

Using the `instructions` Parameter

The instructions parameter provides high-level behavior guidance:

const response = await client.responses.create({
    model: "gpt-5",
    reasoning: { effort: "low" },
    instructions: "Talk like a pirate.",
    input: "Are semicolons optional in JavaScript?",
});

Note: The instructions parameter only applies to the current request. For multi-turn conversations, use developer messages.

Equivalent Message Structure

const response = await client.responses.create({
    model: "gpt-5",
    reasoning: { effort: "low" },
    input: [
        {
            role: "developer",
            content: "Talk like a pirate."
        },
        {
            role: "user",
            content: "Are semicolons optional in JavaScript?",
        },
    ],
});

Practical Analogy

Think of developer and user messages like a function and its arguments:

developer = Function definition (rules, logic)
user = Function arguments (inputs, configuration)

📝 Prompt Structure Best Practices

Recommended Sections (in order)

OpenAI recommends structuring developer messages with these sections:

Identity: Purpose, communication style, high-level goals
Instructions: Rules, guidance, what to do/not do
Examples: Sample inputs with desired outputs
Context: Additional information for the specific request

Template Structure

# Identity
You are a [role] specializing in [domain].
Your communication style is [style].

# Instructions
* [Rule 1]
* [Rule 2]
* [What to do]
* [What not to do]

# Examples
<user_query>
[Example input 1]
</user_query>
<assistant_response>
[Expected output 1]
</assistant_response>

# Context
[Request-specific information - place at END for caching benefits]

Key Principles

Be explicit: State requirements clearly
Use consistent structure: Same format across all examples
Order matters: Place static content first, dynamic content last
Separate concerns: Use clear delimiters between sections

🎯 Few-Shot Learning Techniques

Few-shot learning lets you steer models toward new tasks by including input/output examples in the prompt.

Best Practices

1. Diverse Examples

Show a range of possible inputs with expected outputs:

# Examples

<product_review id="example-1">
I absolutely love these headphones — sound quality is amazing!
</product_review>
<assistant_response id="example-1">
Positive
</assistant_response>

<product_review id="example-2">
Battery life is okay, but the ear pads feel cheap.
</product_review>
<assistant_response id="example-2">
Neutral
</assistant_response>

<product_review id="example-3">
Terrible customer service, I'll never buy from them again.
</product_review>
<assistant_response id="example-3">
Negative
</assistant_response>

2. Consistent Formatting

Ensure all examples follow the same structure:

Same delimiters (XML tags, headers)
Same whitespace patterns
Same output format

3. Example Quantity

Start with 2-5 examples
Add more if the model isn’t generalizing correctly
Too many examples can cause overfitting

4. ID Attributes for Clarity

Use id attributes to link inputs with outputs:

<user_query id="example-1">
How do I declare a string variable?
</user_query>
<assistant_response id="example-1">
var first_name = "Anna";
</assistant_response>

📐 Formatting with Markdown and XML

OpenAI recommends using Markdown and XML tags to help models understand prompt structure.

When to Use Each

Format	Best For
Markdown headers	Section organization, hierarchy
Markdown lists	Instructions, enumerated items
XML tags	Data boundaries, input/output pairs
XML attributes	Metadata, IDs, categorization

Combined Example

# Identity

You are coding assistant that enforces snake_case in JavaScript.

# Instructions

* Use snake_case for all variable names
* Use `var` for browser compatibility
* Do not include Markdown formatting in responses

# Examples

<user_query>
How do I declare a string variable for a first name?
</user_query>

<assistant_response>
var first_name = "Anna";
</assistant_response>

XML for Data Boundaries

When including user-provided content, wrap it in XML to prevent prompt injection:

# Context

<user_document source="uploaded_file.txt">
[User's potentially untrusted content]
</user_document>

Now analyze the document above for security vulnerabilities.

🧠 Reasoning Models (o3, o4-mini)

Reasoning models are fundamentally different from GPT models and require different prompting strategies.

How Reasoning Works

Reasoning models generate internal chain of thought (reasoning tokens) before producing visible output:

Input Tokens → [Internal Reasoning Tokens] → Output Tokens
                  (not visible, but billed)

Important: Reasoning tokens are billed as output tokens but not visible in responses.

When to Use Reasoning Models

✅ Strong Use Cases

Use Case	Why Reasoning Excels
Ambiguous tasks	Handles gaps in instructions, asks clarifying questions
Needle in haystack	Finds relevant info in large unstructured data
Document relationships	Draws parallels across multiple documents
Agentic planning	Creates detailed multi-step solutions
Visual reasoning	Grasps complex charts, tables, architectural drawings
Code review	Detects subtle issues human reviewers might miss
Evaluation/grading	Nuanced judgment of other model outputs

❌ Avoid For

Simple, well-defined tasks (use GPT instead)
Latency-sensitive applications
High-volume, low-complexity requests

Prompting Differences

Aspect	GPT Models	Reasoning Models
Instruction style	Detailed, step-by-step	High-level goals
“Think step by step”	Helpful	Unnecessary/harmful
Few-shot examples	Often required	Try zero-shot first
Specificity	Be explicit about how	Be specific about what

Key Prompting Rules

From OpenAI’s official guidance:

Developer messages replace system messages: Use developer role for top-level guidance
Keep prompts simple and direct: Models excel with brief, clear instructions
Avoid chain-of-thought prompts: Models reason internally
Use delimiters for clarity: Markdown, XML tags help interpretation
Try zero-shot first: Add few-shot only if needed
Provide specific guidelines: Explicit constraints produce better results
Be specific about success criteria: Describe what a good output looks like
Markdown formatting: Include Formatting re-enabled to enable markdown output

Reasoning Effort Parameter

Control reasoning depth with the effort parameter:

response = client.responses.create(
    model="o4-mini",
    reasoning={"effort": "medium"},  # low, medium, or high
    input=[{"role": "user", "content": prompt}]
)

Effort	Token Usage	Speed	Best For
`low`	Minimal	Fastest	Simple decisions
`medium`	Balanced	Moderate	Most tasks (default)
`high`	Maximum	Slowest	Complex problems

Context Window Management

Reserve sufficient space for reasoning tokens:

response = client.responses.create(
    model="o4-mini",
    reasoning={"effort": "medium"},
    max_output_tokens=30000,  # Reserve at least 25K for reasoning + output
    input=[{"role": "user", "content": prompt}]
)

Recommendation: Start with at least 25,000 tokens reserved for reasoning and outputs.

Handling Incomplete Responses

Check for truncation due to token limits:

if response.status == "incomplete" and response.incomplete_details.reason == "max_output_tokens":
    print("Ran out of tokens during reasoning")

⚡ GPT-5 Specific Optimizations

GPT-5 is highly steerable and responsive to well-specified prompts.

Key Characteristics

Very precise instruction following
Benefits from explicit developer message structure
Responds well to detailed examples
Supports vision capabilities

Prompting Best Practices

For Coding Tasks

# Identity
You are a senior Python developer specializing in async programming.

# Instructions
* Use Python 3.11+ syntax
* Include type hints for all function parameters
* Add docstrings to all public functions
* Handle exceptions explicitly

# Constraints
* No external dependencies beyond stdlib
* Maximum function length: 50 lines

# Output Format
Return only the code, no explanations.

For Front-End Engineering

# Identity
You are a React component specialist.

# Instructions
* Use functional components with hooks
* Follow accessibility best practices (ARIA)
* Include PropTypes or TypeScript interfaces
* Component should be self-contained

# Examples
[Include 2-3 component examples]

For Agentic Tasks

# Identity
You are a task orchestration agent.

# Instructions
* Break down complex tasks into steps
* Identify dependencies between steps
* Assign appropriate tools to each step
* Verify completion before proceeding

# Available Tools
1. search_codebase - Find relevant code
2. edit_file - Modify files
3. run_tests - Execute test suite

💰 Cost Optimization: Prompt Caching

OpenAI’s prompt caching reduces costs and latency for repeated content.

How It Works

Content at the beginning of prompts is cached and reused across requests:

┌─────────────────────────────────────┐
│ Static content (cached)             │ ← Same across requests
├─────────────────────────────────────┤
│ Dynamic content (not cached)        │ ← Changes per request
└─────────────────────────────────────┘

Optimization Strategy

Order parameters consistently: Same JSON key order in API requests
Static content first: Instructions, examples at the beginning
Dynamic content last: User inputs, context at the end

Example Structure

# Identity
[Static - cached]

# Instructions  
[Static - cached]

# Examples
[Static - cached]

# Context
[Dynamic - changes per request - placed LAST]

🔧 Practical Examples

Example 1: Code Refactoring with o3

from openai import OpenAI

client = OpenAI()

prompt = """
Instructions:
- Given the React component below, change it so that nonfiction books have red text.
- Return only the code in your reply
- Do not include any additional formatting, such as markdown code blocks
- Use four space tabs, no lines exceeding 80 columns

const books = [
  { title: 'Dune', category: 'fiction', id: 1 },
  { title: 'Frankenstein', category: 'fiction', id: 2 },
  { title: 'Moneyball', category: 'nonfiction', id: 3 },
];

export default function BookList() {
  const listItems = books.map(book =>
    <li>
      {book.title}
    </li>
  );

  return (
    <ul>{listItems}</ul>
  );
}
"""

response = client.responses.create(
    model="o3",
    reasoning={"effort": "medium"},
    input=[{"role": "user", "content": prompt}],
)

print(response.output_text)

Example 2: Sentiment Classification with GPT-4o

developer_message = """
# Identity
You are a sentiment classifier for product reviews.

# Instructions
* Output only: Positive, Negative, or Neutral
* No additional formatting or explanation
* Consider context and nuance

# Examples

<review id="1">
I absolutely love this headphones — sound quality is amazing!
</review>
<classification id="1">
Positive
</classification>

<review id="2">
Battery life is okay, but the ear pads feel cheap.
</review>
<classification id="2">
Neutral
</classification>

<review id="3">
Terrible customer service, I'll never buy from them again.
</review>
<classification id="3">
Negative
</classification>
"""

response = client.responses.create(
    model="gpt-4o",
    input=[
        {"role": "developer", "content": developer_message},
        {"role": "user", "content": f"<review>{user_review}</review>"}
    ]
)

Example 3: Multi-Model Architecture

# Step 1: Planning with o3
planning_response = client.responses.create(
    model="o3",
    reasoning={"effort": "high"},
    input=[{
        "role": "developer",
        "content": "You are a task planner. Break down complex requests into actionable steps."
    }, {
        "role": "user", 
        "content": "Refactor this codebase to use dependency injection."
    }]
)

plan = planning_response.output_text

# Step 2: Execution with GPT-4o (faster, cheaper)
for step in parse_plan(plan):
    execution_response = client.responses.create(
        model="gpt-4o",
        input=[{
            "role": "developer",
            "content": f"Execute this specific task: {step}"
        }, {
            "role": "user",
            "content": relevant_code
        }]
    )

⚠️ Common Pitfalls

Pitfall 1: Using CoT with Reasoning Models

❌ Wrong:

Think step by step about how to solve this problem.
First, identify the key variables...

✅ Correct:

Solve this optimization problem. Ensure the solution minimizes cost while meeting all constraints.

Pitfall 2: Inconsistent Example Formatting

❌ Wrong:

Example 1: Input: "hello" -> Output: greeting
Example 2:
  Input: goodbye
  Result: farewell

✅ Correct:

<example id="1">
<input>hello</input>
<output>greeting</output>
</example>

<example id="2">
<input>goodbye</input>
<output>farewell</output>
</example>

Pitfall 3: Dynamic Content Before Static

❌ Wrong:

# User Request
{{dynamic_content}}

# Instructions
[Static rules that could be cached]

✅ Correct:

# Instructions
[Static rules - cached]

# User Request
{{dynamic_content}}

Pitfall 4: Insufficient Token Reservation for Reasoning

❌ Wrong:

response = client.responses.create(
    model="o3",
    max_output_tokens=1000,  # Too small for reasoning
    ...
)

✅ Correct:

response = client.responses.create(
    model="o3",
    max_output_tokens=30000,  # Adequate for reasoning + output
    ...
)

📚 References

Official Documentation

📘 OpenAI Prompt Engineering Guide [📘 Official] Primary source for GPT model prompting techniques.
📘 OpenAI Reasoning Models Guide [📘 Official] Technical documentation for o-series reasoning models.
📘 OpenAI Reasoning Best Practices [📘 Official] When and how to use reasoning models effectively.
📘 OpenAI Model Spec [📘 Official] Defines model behavior, chain of command, and message priorities.
📘 OpenAI Prompt Caching [📘 Official] Cost optimization through content caching.

Appendix 01: OpenAI Models Prompting Guide Analysis

Table of Contents

📊 Model Overview

Model Categories

Model Selection Guide

Default Recommendation

🏗️ Message Roles and Authority Chain

Role Hierarchy

Using the instructions Parameter

Equivalent Message Structure

Practical Analogy

📝 Prompt Structure Best Practices

Recommended Sections (in order)

Template Structure

Key Principles

🎯 Few-Shot Learning Techniques

Best Practices

1. Diverse Examples

2. Consistent Formatting

3. Example Quantity

4. ID Attributes for Clarity

📐 Formatting with Markdown and XML

When to Use Each

Combined Example

XML for Data Boundaries

🧠 Reasoning Models (o3, o4-mini)

How Reasoning Works

When to Use Reasoning Models

✅ Strong Use Cases

❌ Avoid For

Prompting Differences

Key Prompting Rules

Reasoning Effort Parameter

Context Window Management

Handling Incomplete Responses

⚡ GPT-5 Specific Optimizations

Key Characteristics

Prompting Best Practices

For Coding Tasks

For Front-End Engineering

For Agentic Tasks

💰 Cost Optimization: Prompt Caching

How It Works

Optimization Strategy

Example Structure

🔧 Practical Examples

Example 1: Code Refactoring with o3

Example 2: Sentiment Classification with GPT-4o

Example 3: Multi-Model Architecture

⚠️ Common Pitfalls

Pitfall 1: Using CoT with Reasoning Models

Pitfall 2: Inconsistent Example Formatting

Pitfall 3: Dynamic Content Before Static

Pitfall 4: Insufficient Token Reservation for Reasoning

📚 References

Official Documentation

Related Resources

Using the `instructions` Parameter