Appendix 01: OpenAI Models Prompting Guide Analysis

Deep analysis of OpenAI’s official prompting guides for GPT-4o, GPT-5, and reasoning models (o3, o4-mini) with extracted techniques and examples
Author

Dario Airoldi

Published

January 20, 2026

Appendix 01: OpenAI Models Prompting Guide Analysis

This appendix provides a comprehensive analysis of OpenAI’s official prompting documentation, extracting key techniques, patterns, and recommendations for GPT-4o, GPT-5, and reasoning models (o3, o4-mini).

Guide Version: This analysis is based on OpenAI documentation as of 2026-01-20. Official guides may have been updated since this analysis. Always verify with the official documentation.

Table of Contents

πŸ“Š Model Overview

Model Categories

OpenAI provides two primary model families with distinct characteristics:

Family Models Strengths Weaknesses
GPT Models GPT-4o, GPT-4.1, GPT-5, GPT-5.2 Fast, cost-efficient, highly steerable Benefit from explicit instructions
Reasoning Models o3, o4-mini, o1 (legacy) Complex reasoning, planning, accuracy Slower, more expensive, different prompting

Model Selection Guide

From OpenAI’s official guidance:

Priority Recommended Model
Speed and cost GPT models (GPT-4o, GPT-4o mini)
Well-defined tasks GPT models
Accuracy and reliability o-series models (o3, o4-mini)
Complex problem-solving o-series models

Key Insight: β€œMost AI workflows will use a combination of both modelsβ€”o-series for agentic planning and decision-making, GPT series for task execution.”

Default Recommendation

When in doubt, gpt-4.1 offers a solid combination of intelligence, speed, and cost effectiveness.

πŸ—οΈ Message Roles and Authority Chain

OpenAI models use a chain of command for message priority, defined in the OpenAI Model Spec.

Role Hierarchy

Role Purpose Priority
developer System rules and business logic (like a function definition) Highest
user End-user inputs and configuration (like function arguments) Lower
assistant Model-generated responses β€”

Using the instructions Parameter

The instructions parameter provides high-level behavior guidance:

const response = await client.responses.create({
    model: "gpt-5",
    reasoning: { effort: "low" },
    instructions: "Talk like a pirate.",
    input: "Are semicolons optional in JavaScript?",
});

Note: The instructions parameter only applies to the current request. For multi-turn conversations, use developer messages.

Equivalent Message Structure

const response = await client.responses.create({
    model: "gpt-5",
    reasoning: { effort: "low" },
    input: [
        {
            role: "developer",
            content: "Talk like a pirate."
        },
        {
            role: "user",
            content: "Are semicolons optional in JavaScript?",
        },
    ],
});

Practical Analogy

Think of developer and user messages like a function and its arguments:

  • developer = Function definition (rules, logic)
  • user = Function arguments (inputs, configuration)

πŸ“ Prompt Structure Best Practices

Template Structure

# Identity
You are a [role] specializing in [domain].
Your communication style is [style].

# Instructions
* [Rule 1]
* [Rule 2]
* [What to do]
* [What not to do]

# Examples
<user_query>
[Example input 1]
</user_query>
<assistant_response>
[Expected output 1]
</assistant_response>

# Context
[Request-specific information - place at END for caching benefits]

Key Principles

  1. Be explicit: State requirements clearly
  2. Use consistent structure: Same format across all examples
  3. Order matters: Place static content first, dynamic content last
  4. Separate concerns: Use clear delimiters between sections

🎯 Few-Shot Learning Techniques

Few-shot learning lets you steer models toward new tasks by including input/output examples in the prompt.

Best Practices

1. Diverse Examples

Show a range of possible inputs with expected outputs:

# Examples

<product_review id="example-1">
I absolutely love these headphones β€” sound quality is amazing!
</product_review>
<assistant_response id="example-1">
Positive
</assistant_response>

<product_review id="example-2">
Battery life is okay, but the ear pads feel cheap.
</product_review>
<assistant_response id="example-2">
Neutral
</assistant_response>

<product_review id="example-3">
Terrible customer service, I'll never buy from them again.
</product_review>
<assistant_response id="example-3">
Negative
</assistant_response>

2. Consistent Formatting

Ensure all examples follow the same structure:

  • Same delimiters (XML tags, headers)
  • Same whitespace patterns
  • Same output format

3. Example Quantity

  • Start with 2-5 examples
  • Add more if the model isn’t generalizing correctly
  • Too many examples can cause overfitting

4. ID Attributes for Clarity

Use id attributes to link inputs with outputs:

<user_query id="example-1">
How do I declare a string variable?
</user_query>
<assistant_response id="example-1">
var first_name = "Anna";
</assistant_response>

πŸ“ Formatting with Markdown and XML

OpenAI recommends using Markdown and XML tags to help models understand prompt structure.

When to Use Each

Format Best For
Markdown headers Section organization, hierarchy
Markdown lists Instructions, enumerated items
XML tags Data boundaries, input/output pairs
XML attributes Metadata, IDs, categorization

Combined Example

# Identity

You are coding assistant that enforces snake_case in JavaScript.

# Instructions

* Use snake_case for all variable names
* Use `var` for browser compatibility
* Do not include Markdown formatting in responses

# Examples

<user_query>
How do I declare a string variable for a first name?
</user_query>

<assistant_response>
var first_name = "Anna";
</assistant_response>

XML for Data Boundaries

When including user-provided content, wrap it in XML to prevent prompt injection:

# Context

<user_document source="uploaded_file.txt">
[User's potentially untrusted content]
</user_document>

Now analyze the document above for security vulnerabilities.

🧠 Reasoning Models (o3, o4-mini)

Reasoning models are fundamentally different from GPT models and require different prompting strategies.

How Reasoning Works

Reasoning models generate internal chain of thought (reasoning tokens) before producing visible output:

Input Tokens β†’ [Internal Reasoning Tokens] β†’ Output Tokens
                  (not visible, but billed)

Important: Reasoning tokens are billed as output tokens but not visible in responses.

When to Use Reasoning Models

βœ… Strong Use Cases

Use Case Why Reasoning Excels
Ambiguous tasks Handles gaps in instructions, asks clarifying questions
Needle in haystack Finds relevant info in large unstructured data
Document relationships Draws parallels across multiple documents
Agentic planning Creates detailed multi-step solutions
Visual reasoning Grasps complex charts, tables, architectural drawings
Code review Detects subtle issues human reviewers might miss
Evaluation/grading Nuanced judgment of other model outputs

❌ Avoid For

  • Simple, well-defined tasks (use GPT instead)
  • Latency-sensitive applications
  • High-volume, low-complexity requests

Prompting Differences

Aspect GPT Models Reasoning Models
Instruction style Detailed, step-by-step High-level goals
β€œThink step by step” Helpful Unnecessary/harmful
Few-shot examples Often required Try zero-shot first
Specificity Be explicit about how Be specific about what

Key Prompting Rules

From OpenAI’s official guidance:

  1. Developer messages replace system messages: Use developer role for top-level guidance
  2. Keep prompts simple and direct: Models excel with brief, clear instructions
  3. Avoid chain-of-thought prompts: Models reason internally
  4. Use delimiters for clarity: Markdown, XML tags help interpretation
  5. Try zero-shot first: Add few-shot only if needed
  6. Provide specific guidelines: Explicit constraints produce better results
  7. Be specific about success criteria: Describe what a good output looks like
  8. Markdown formatting: Include Formatting re-enabled to enable markdown output

Reasoning Effort Parameter

Control reasoning depth with the effort parameter:

response = client.responses.create(
    model="o4-mini",
    reasoning={"effort": "medium"},  # low, medium, or high
    input=[{"role": "user", "content": prompt}]
)
Effort Token Usage Speed Best For
low Minimal Fastest Simple decisions
medium Balanced Moderate Most tasks (default)
high Maximum Slowest Complex problems

Context Window Management

Reserve sufficient space for reasoning tokens:

response = client.responses.create(
    model="o4-mini",
    reasoning={"effort": "medium"},
    max_output_tokens=30000,  # Reserve at least 25K for reasoning + output
    input=[{"role": "user", "content": prompt}]
)

Recommendation: Start with at least 25,000 tokens reserved for reasoning and outputs.

Handling Incomplete Responses

Check for truncation due to token limits:

if response.status == "incomplete" and response.incomplete_details.reason == "max_output_tokens":
    print("Ran out of tokens during reasoning")

⚑ GPT-5 Specific Optimizations

GPT-5 is highly steerable and responsive to well-specified prompts.

Key Characteristics

  • Very precise instruction following
  • Benefits from explicit developer message structure
  • Responds well to detailed examples
  • Supports vision capabilities

Prompting Best Practices

For Coding Tasks

# Identity
You are a senior Python developer specializing in async programming.

# Instructions
* Use Python 3.11+ syntax
* Include type hints for all function parameters
* Add docstrings to all public functions
* Handle exceptions explicitly

# Constraints
* No external dependencies beyond stdlib
* Maximum function length: 50 lines

# Output Format
Return only the code, no explanations.

For Front-End Engineering

# Identity
You are a React component specialist.

# Instructions
* Use functional components with hooks
* Follow accessibility best practices (ARIA)
* Include PropTypes or TypeScript interfaces
* Component should be self-contained

# Examples
[Include 2-3 component examples]

For Agentic Tasks

# Identity
You are a task orchestration agent.

# Instructions
* Break down complex tasks into steps
* Identify dependencies between steps
* Assign appropriate tools to each step
* Verify completion before proceeding

# Available Tools
1. search_codebase - Find relevant code
2. edit_file - Modify files
3. run_tests - Execute test suite

πŸ’° Cost Optimization: Prompt Caching

OpenAI’s prompt caching reduces costs and latency for repeated content.

How It Works

Content at the beginning of prompts is cached and reused across requests:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Static content (cached)             β”‚ ← Same across requests
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Dynamic content (not cached)        β”‚ ← Changes per request
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Optimization Strategy

  1. Order parameters consistently: Same JSON key order in API requests
  2. Static content first: Instructions, examples at the beginning
  3. Dynamic content last: User inputs, context at the end

Example Structure

# Identity
[Static - cached]

# Instructions  
[Static - cached]

# Examples
[Static - cached]

# Context
[Dynamic - changes per request - placed LAST]

πŸ”§ Practical Examples

Example 1: Code Refactoring with o3

from openai import OpenAI

client = OpenAI()

prompt = """
Instructions:
- Given the React component below, change it so that nonfiction books have red text.
- Return only the code in your reply
- Do not include any additional formatting, such as markdown code blocks
- Use four space tabs, no lines exceeding 80 columns

const books = [
  { title: 'Dune', category: 'fiction', id: 1 },
  { title: 'Frankenstein', category: 'fiction', id: 2 },
  { title: 'Moneyball', category: 'nonfiction', id: 3 },
];

export default function BookList() {
  const listItems = books.map(book =>
    <li>
      {book.title}
    </li>
  );

  return (
    <ul>{listItems}</ul>
  );
}
"""

response = client.responses.create(
    model="o3",
    reasoning={"effort": "medium"},
    input=[{"role": "user", "content": prompt}],
)

print(response.output_text)

Example 2: Sentiment Classification with GPT-4o

developer_message = """
# Identity
You are a sentiment classifier for product reviews.

# Instructions
* Output only: Positive, Negative, or Neutral
* No additional formatting or explanation
* Consider context and nuance

# Examples

<review id="1">
I absolutely love this headphones β€” sound quality is amazing!
</review>
<classification id="1">
Positive
</classification>

<review id="2">
Battery life is okay, but the ear pads feel cheap.
</review>
<classification id="2">
Neutral
</classification>

<review id="3">
Terrible customer service, I'll never buy from them again.
</review>
<classification id="3">
Negative
</classification>
"""

response = client.responses.create(
    model="gpt-4o",
    input=[
        {"role": "developer", "content": developer_message},
        {"role": "user", "content": f"<review>{user_review}</review>"}
    ]
)

Example 3: Multi-Model Architecture

# Step 1: Planning with o3
planning_response = client.responses.create(
    model="o3",
    reasoning={"effort": "high"},
    input=[{
        "role": "developer",
        "content": "You are a task planner. Break down complex requests into actionable steps."
    }, {
        "role": "user", 
        "content": "Refactor this codebase to use dependency injection."
    }]
)

plan = planning_response.output_text

# Step 2: Execution with GPT-4o (faster, cheaper)
for step in parse_plan(plan):
    execution_response = client.responses.create(
        model="gpt-4o",
        input=[{
            "role": "developer",
            "content": f"Execute this specific task: {step}"
        }, {
            "role": "user",
            "content": relevant_code
        }]
    )

⚠️ Common Pitfalls

Pitfall 1: Using CoT with Reasoning Models

❌ Wrong:

Think step by step about how to solve this problem.
First, identify the key variables...

βœ… Correct:

Solve this optimization problem. Ensure the solution minimizes cost while meeting all constraints.

Pitfall 2: Inconsistent Example Formatting

❌ Wrong:

Example 1: Input: "hello" -> Output: greeting
Example 2:
  Input: goodbye
  Result: farewell

βœ… Correct:

<example id="1">
<input>hello</input>
<output>greeting</output>
</example>

<example id="2">
<input>goodbye</input>
<output>farewell</output>
</example>

Pitfall 3: Dynamic Content Before Static

❌ Wrong:

# User Request
{{dynamic_content}}

# Instructions
[Static rules that could be cached]

βœ… Correct:

# Instructions
[Static rules - cached]

# User Request
{{dynamic_content}}

Pitfall 4: Insufficient Token Reservation for Reasoning

❌ Wrong:

response = client.responses.create(
    model="o3",
    max_output_tokens=1000,  # Too small for reasoning
    ...
)

βœ… Correct:

response = client.responses.create(
    model="o3",
    max_output_tokens=30000,  # Adequate for reasoning + output
    ...
)

πŸ“š References

Official Documentation