ai

GPT-5.4 Mini and Nano: OpenAI's Fastest Small Models Are Built for the Age of AI Agents

OpenAI releases GPT-5.4 mini and nano — faster, cheaper models designed for agentic workflows, with near-flagship performance at a fraction of the cost.

OpenAI announcement graphic for GPT-5.4 mini and nano — optimized for speed, coding, and agentic subagent workflows.

Less than two weeks after launching GPT-5.4, OpenAI has released two more models: GPT-5.4 mini and GPT-5.4 nano. Both are smaller, faster, and cheaper than the flagship — and both are specifically designed for the kind of work that modern AI systems increasingly rely on: the repetitive, parallel, behind-the-scenes tasks that larger models hand off to smaller ones.

This is OpenAI catching up with how people are actually building AI products.

What Are These Models, Exactly?

GPT-5.4 mini is the main event here. It offers significant improvements over GPT-5 mini across coding, reasoning, multimodal understanding, and tool use, while running more than 2x faster. It costs $0.75 per million input tokens and $4.50 per million output tokens, with a 400,000-token context window.

The benchmark numbers back up the positioning. On SWE-Bench Pro — a test of real software engineering tasks — mini scores 54.38%, sitting just 3 percentage points behind the full GPT-5.4. On OSWorld-Verified, which measures computer-use ability, it scores 72.13% against the flagship’s 75.03%. That’s close enough to matter in practice.

GPT-5.4 nano is OpenAI’s cheapest model right now, at $0.20 per million input tokens and $1.25 per million output tokens. It’s designed for simple, fast tasks: classification, data extraction, ranking, and lightweight coding sub-tasks. It’s faster and cheaper than anything OpenAI has offered before, though its OSWorld-Verified score (39.01%) sits slightly below GPT-5 mini’s 42%, so it’s not a straight upgrade in every area.

Nano is API-only for now. Mini is available in the API, in Codex, and in ChatGPT.

The Bigger Picture: Why Small Models Matter Right Now

The release timing is telling. Google released Gemini 3 Flash this month. Anthropic shipped the Claude 4 series. Every major AI lab is now competing on the efficiency end of the spectrum, not just the raw capability end.

The reason is straightforward. Teams building real AI products don’t use one model for everything. A typical agentic system might use a large, capable model to plan a task and make decisions, then delegate the actual execution — searching a codebase, reviewing a file, running a subtask in parallel — to smaller models that can do it fast and cheap. OpenAI’s own Codex already works this way: GPT-5.4 handles planning and coordination, GPT-5.4 mini handles the narrower subtasks.

Mini uses only 30% of GPT-5.4’s quota in Codex, which means developers can run far more work for the same cost, without sacrificing much on quality.

What This Means For You

If you use ChatGPT for free: GPT-5.4 mini is now available to Free and Go users via the “Thinking” option in the plus menu. That’s a meaningful step up from what free users had access to before — near-flagship reasoning, at no cost.

If you’re on a paid plan: Mini will serve as a high-speed fallback when you hit rate limits on more intensive models. Your workflow keeps moving even when the bigger models are busy.

If you’re a developer: This is where things get interesting. In Codex, mini is available across the app, CLI, IDE extension, and web. The 30% quota usage means you can run about three times as many tasks for the same spend. If you’re building any kind of agentic pipeline — coding assistants, document processors, multi-step workflows — mini is worth evaluating as your sub-agent model. Nano is worth a look for anything classification or extraction-heavy where you need high throughput at minimal cost.

If you’re deploying on Azure: Both models are rolling out in Microsoft Azure AI Foundry, available in the model catalog. They’re live in Data Zone US, with EU rollout in progress.

A Note on Pricing vs. the Competition

Mini is capable, but it’s worth being clear-eyed about where it sits on cost. At $0.75/$4.50 per million tokens, it’s priced higher than some comparable options — Gemini 3 Flash, for instance, scores 78% on SWE-Bench Verified at $0.50/$3.00. OpenAI is betting that the performance gap, the ecosystem integration (especially in Codex), and the reliability of the platform justify the premium for developers already working within their stack.

For teams not already invested in OpenAI’s tooling, that’s a comparison worth running before committing.

The Short Version

GPT-5.4 mini and nano are not headline-grabbing flagship releases. They’re infrastructure. They’re the models that make the ambitious stuff possible: faster pipelines, lower costs, more tasks handled in parallel, and AI systems that can do more without billing you like they’re running the whole operation on a flagship model every time.

If you’re building with AI, these are the models worth paying attention to right now. The big models get the press. The small ones get the work done.