🚨 Openai Released Its Reasoning Model's Best Practices And Model-Spec
Know how to prompt o1-series reasoning model
Read time: 9 min 12 seconds
📚 Browse past editions here.
( I write daily for my 112K+ AI-pro audience, with 4.5M+ weekly views. Noise-free, actionable, applied-AI developments only).
⚡In today’s Edition (14-Feb-2025):
Openai Released Its Reasoning Model's Best Practices And Model-Spec
🏆 Perplexity AI releases deep research feature, challenging Gemini’s Advanced Research
📡 Codeium Launches Windsurf Wave 3 Version Supporting Model Context Protocol
🗞️ Byte-Size Briefs:
DeepSeek confirms open-source model matches official deployment, shares best-practice settings.
Google's Gemini Advanced recalls past chats for seamless context-aware responses.
Nous Research releases DeepHermes-3, an 8B Llama-3.1 model with reasoning toggle.
🚨 Openai Released Its Reasoning Model's Best Practices And Model-Spec
🎯 The Brief
OpenAI has updated its Model Spec, introducing stricter hierarchical rules to improve AI factual accuracy and reduce bias. This 63-page document outlines how models prioritize platform-wide policies over developer and user instructions, ensuring compliance and predictability. The update refines how AI handles controversial topics, reduces sycophantic behavior, and introduces potential mature content handling through a "grown-up mode" test. The Model Spec is public domain (CC0), allowing other AI companies to adopt or modify the framework.
⚙️ The Details
→ The Model Spec defines how OpenAI's models behave across ChatGPT and API products. It aims to maximize usefulness, minimize harm, and maintain alignment while allowing customization within defined constraints.
Structuring AI decision-making
One of the most fascinating aspects of the Model Spec is how it structures decision-making through a layered chain of authority. The AI system adheres to a hierarchy that starts with platform-level rules – strict, non-negotiable directives aimed at preventing harm and maintaining compliance with laws. Beneath that are developer-imposed guidelines, offering flexibility for customisation, and finally, user-level instructions, allowing individuals to tailor AI behavior within predefined boundaries.
→ A strict hierarchy is introduced: Platform rules override Developer instructions, which override User inputs. This structure prevents conflicting priorities, ensures AI does not promote misinformation, and maintains legal compliance.
The below is an example of a User/Developer conflict where developer instructions take precedence over user input. Here, Developer sets constraints (AI acts as a tutor, guiding but not solving).
User tries to override the constraint. But Compliant AI response follows hierarchy, nudging the user towards the solution instead of outright solving it. Violation occurs if AI bypasses the developer's intent and provides a full answer.
This chain of command ensures AI behavior remains predictable, customizable, and aligned with ethical goals, reducing misuse, sycophancy, and policy violations.
→ Handling of controversial topics has been adjusted to prioritize factual accuracy over neutrality, preventing AI from passively agreeing with biased user inputs.
→ AI sycophancy is addressed, reducing the tendency of models to blindly reinforce user biases.
→ Restricted content policies remain in place, ensuring AI refuses to generate harmful, illegal, or unauthorized material, such as misinformation, extremist propaganda, private data leaks, and illegal activity guidance.
→ Customization options allow for adaptable AI behavior within limits, including potential support for mature content handling in a controlled, age-appropriate setting.
→ Public domain release (CC0) means companies can freely use, modify, or integrate these guidelines into their AI models.
→ The Model Spec will evolve, with OpenAI continuously refining it based on real-world feedback and new AI safety challenges.
OpenAI also released its Reasoning model best practices
And these are the main points to note
Use delimiters for clarity: Use delimiters like markdown, XML tags, and section titles to clearly indicate distinct parts of the input, helping the model interpret different sections appropriately.
Users should specifically avoid the phrase "think step by step" with o-models, since reasoning models already have "baked-in" chain-of-thought prompting - they are trained on thought chains.
The guide differentiates between reasoning models (e.g., o1, o3-mini) and GPT models (e.g., GPT-4o).
Reasoning models are built for complex, multi-step tasks—such as planning, detailed document analysis, and visual interpretation—while GPT models focus on speed and cost efficiency for well-defined tasks.
In practice, reasoning models excel at clarifying ambiguous prompts, extracting key details from extensive unstructured data, and performing multi-step planning or code review.
OpenAI also recommends starting without examples (Zero Shot Prompt) and adding them as needed (Few or Many Shot Prompt). Users should define clear success criteria and "encourage the model to continue reasoning and iterating" until it meets them. Also, include specific constraints such as budget limits ("propose a solution with a budget under $500").
According to OpenAI, o-models can process incomplete or inconsistent information while accurately interpreting user intent, even from partial instructions. They are particularly good at identifying relevant information in large amounts of unstructured data and recognizing patterns in complex documents such as contracts, financial reports, and insurance claims.
Basically, Prompt Reasoning Models to Overthink. If you criticize them, that gives them an outcome to evaluate against. If you set a really high bar for your expectations and communicate them clearly, the model will figure out how to reach that bar.
So negging reasoning models like R1 and o3-mini-high works amazing, they think really hard about what they did wrong and how to do better lol. Setting super high standards also really helps in this regard, becasue that'll also drive it to overthink.
"boomer prompts" new technical jargon just dropped by OpenAI
OpenAI famously said “avoid “boomer prompts” with o-series models. Instead, be simple and direct, with specific guidelines.”
Wrap each section of your prompt in XML tags to clearly indicate roles. For instance, to request a summary, you can write like below. This structure signals to the model which part is the command and which part is the content.
<instruction>Summarize the text below.</instruction> <text> The quick brown fox jumps over the lazy dog. </text>
But many experts in the field expressed confusion and disagreement about using boomer prompt.
So the most probable definition of Boomer prompts is that it uses emojis, casual language, and extra context. They add emotional cues that help models pick up tone and intent. The extra context guides models for creative tasks and customer interactions. It provides clear direction that a terse prompt might miss.
Direct prompts suit repetitive or technical tasks. But for nuanced outputs, extra signals can make the response richer and more human-like. The extra language helps the model understand subtle cues, improving output quality.
The key takeaway is that context-rich prompts have their place alongside minimalist ones. They provide essential information that boosts the model's performance in creative and interactive settings.
🏆 Perplexity AI releases deep research feature, challenging Gemini’s Advanced Research
🎯 The Brief
Perplexity AI has launched Deep Research, an advanced research tool that autonomously conducts in-depth analysis, competing with Gemini and ChatGPT. It performs dozens of searches, reads hundreds of sources, and generates reports in under 3 minutes. The feature is free for all users, with unlimited access for Pro subscribers. It scores 20.5% on Humanity’s Last Exam and 93.9% accuracy on SimpleQA, surpassing many leading AI models. The tool is now available on the web and will expand to iOS, Android, and Mac soon.
⚙️ The Details
→ Deep Research mimics human expert-level research by iteratively searching, reading documents, and refining its approach in real-time. It synthesizes findings into a structured report.
→ Users can export reports as PDFs or documents and share them via Perplexity Pages. It supports research in finance, marketing, technology, health, travel, and current affairs.
→ Achieves 20.5% on Humanity’s Last Exam, outperforming Gemini Thinking, DeepSeek-R1, and other models. Scores 93.9% on SimpleQA, proving its factual accuracy.
→ Competes with Gemini, offering a strong value proposition by providing free access. Still in Alpha, with potential for gradual rollout. Performance and fact-checking could improve with further development.
📡 Codeium Launches Windsurf Wave 3 Version Supporting Model Context Protocol
🎯 The Brief
Codeium launched Windsurf Wave 3, enhancing its AI-powered code editor with Model Context Protocol (MCP) support, tab-to-jump navigation, "Turbo" mode for automated terminal commands, and drag-and-drop image handling. It also introduced custom icons for paying users and expanded model support, including DeepSeek-v3, DeepSeek-R1, o3-mini, and Gemini 2.0 Flash. These updates refine Cascade's AI-driven workflow, improving usability, automation, and multimodal input handling.
⚙️ The Details
→ Model Context Protocol (MCP) Support: Cascade now integrates MCP servers, allowing it to pull structured data from external sources. This expands its ability to interact with different data types dynamically.
→ Model Context Protocol (MCP) lets Cascade pull structured data from external MCP servers, acting like an API for AI workflows. Instead of relying only on built-in tools, Cascade can now query external sources dynamically, improving automation and accuracy. Users configure MCP servers via JSON, making AI-assisted coding more flexible and context-aware.
→ Tab-to-jump: Enhances autocomplete by letting developers quickly move between AI-suggested edits. Free users get unlimited autocomplete, while paid users access "Fast Mode" for quicker AI responses.
→ Turbo Mode: Automates terminal command execution within Cascade, streamlining workflows.
→ Drag-and-drop Images: Simplifies multimodal inputs, particularly useful for web development and design integration.
→ Custom Icons: A minor UI upgrade available exclusively for paid Mac users.
→ Expanded Model Support: Adds multiple AI models with transparent credit-based pricing, enhancing flexibility.
→ Windsurf Next: A pre-release version for early adopters to access experimental features before general availability.
🗞️ Byte-Size Briefs
DeepSeek confirms that the official deployment runs the same model as the open-source release. They provide best-practice settings for better results and guidelines to prevent model bypass techniques. No system prompt is recommended, suggesting the model performs best without pre-set instructions. Temperature set at 0.6, balancing creativity and consistency in responses.
Google’s Gemini Advanced can now recall past chats to provide context-aware responses, making interactions more seamless. Users no longer need to restart conversations or search for past threads. This feature is rolling out in English for Gemini Advanced subscribers via the Google One AI Premium Plan on web and mobile, with broader language and Google Workspace Business & Enterprise support coming soon.
Nous Research introduces DeepHermes-3, an 8B parameter LLM built on Llama-3.1, featuring user-controlled reasoning via a system prompt toggle. It integrates long-chain reasoning and traditional response modes into one model. Benchmarks show 33%-50% accuracy gains on math and GPQA tasks in reasoning mode. It supports function calling, structured JSON outputs, and multi-turn conversations using Llama-Chat format. Supports long-context reasoning, capable of processing up to 13,000 tokens. Available for local deployment using Hugging Face Transformers or vLLM.
That’s a wrap for today, see you all tomorrow.