Microsoft builds its own AI models: MAI-Thinking-1 and MAI-Code-1 explained
Microsoft unveiled seven in-house MAI models at Build 2026, cutting reliance on OpenAI with its own reasoning and coding models.
At Microsoft Build 2026, held in San Francisco on 2 June, Microsoft announced seven AI models it built entirely in-house, led by MAI-Thinking-1, its first reasoning model. Every one of them was trained from scratch on commercially licensed data, with no distillation from OpenAI or any other third-party model provider. That last detail is the headline beneath the headline.
Why this matters: three years of dependency, now changing
Since 2023, Microsoft’s most visible AI products, including GitHub Copilot, Microsoft 365 Copilot, Bing Chat, and Azure AI, have run almost entirely on OpenAI models. That arrangement made Microsoft the fastest company to ship AI products at scale. It also meant Microsoft was paying OpenAI for inference at the volume of more than 30 million GitHub Copilot users, and that its product roadmap was, to a meaningful degree, shaped by what OpenAI chose to build next.
Microsoft and OpenAI amended their partnership in April 2026, ending Microsoft’s exclusive licence to OpenAI IP and removing Microsoft’s revenue share obligation to OpenAI. The relationship continues, but the terms changed. The MAI announcements at Build are the product side of that same strategic shift.
As Mustafa Suleiman, CEO of Microsoft AI, put it: “This is all about long-term self-sufficiency for Microsoft and our partners. It’s about models you can trust.”
MAI-Thinking-1: the flagship reasoning model
MAI-Thinking-1 is a sparse Mixture of Experts model with 35 billion active parameters and approximately one trillion total parameters. It has a 256,000-token context window, which Microsoft says is enough to process a 600-page document in a single pass. It supports function calling, multi-layered instruction following, and works with the standard Chat Completions API.
On benchmarks, it scores 97.0% on AIME 2025 and 94.5% on AIME 2026, tests that focus on mathematical and multi-step scientific reasoning. Microsoft says it matches Claude Opus 4.6 on coding tasks in SWE-Bench Pro evaluations, and in blind side-by-side tests run by Surge, it was preferred over Claude Sonnet 4.6.
MAI-Thinking-1 is currently in private preview through Microsoft Foundry.
What this means for you: If you are an enterprise developer evaluating reasoning models for complex, multi-step workflows, MAI-Thinking-1 is now a credible option to test alongside the models you already use. The 256K context window and Chat Completions API compatibility mean it slots into existing tooling without major rework.
MAI-Code-1 and MAI-Code-1-Flash: built for GitHub Copilot
MAI-Code-1 is live now in GitHub Copilot and VS Code. Its smaller sibling, MAI-Code-1-Flash, is the more technically interesting model.
MAI-Code-1-Flash has 137 billion total parameters in a sparse Mixture of Experts architecture, a 256,000-token context window, and was trained between March and May 2026. It was built specifically for GitHub Copilot production workloads, starting from MAI-Thinking-1’s mid-training checkpoint and going through supervised fine-tuning and reinforcement learning tuned to Copilot’s own telemetry and task harnesses.
The benchmark comparison Microsoft leads with: on SWE-Bench Pro, MAI-Code-1-Flash scores 51.2% against 35.2% for Claude Haiku 4.5, a 16-percentage-point gap. On SWE-Bench Verified, it scores 71.6% against 66.6% for Haiku. Microsoft also claims it uses 60% fewer tokens on complex coding tasks versus comparable models.
On pricing, GitHub’s published rates list MAI-Code-1-Flash at $0.75 per million input tokens and $4.50 per million output tokens, with cached input at $0.075.
What this means for you: MAI-Code-1-Flash is rolling out to Copilot Free, Pro, Pro+, and Max plans, starting with a limited set of users and expanding over the coming weeks. You may not see it immediately, but when you do, the practical difference is a model designed around how Copilot is actually used, not adapted from a general-purpose model. The token efficiency claim is particularly relevant now that Copilot moved to usage-based AI Credits billing on 1 June. Fewer tokens consumed on a complex task is a direct reduction in your bill.
The rest of the MAI family
The other five models in the family cover the modalities that make AI practical inside software products, not just chat:
- MAI-Image-2.5 and its flash variant handle both text-to-image and image-to-image generation. They rank third and second respectively on the Arena AI leaderboard. Both are live in PowerPoint and rolling out to OneDrive, with Foundry availability coming.
- MAI-Transcribe-1.5 claims state-of-the-art accuracy across 43 languages.
- MAI-Voice-2 adds 15 new languages and expanded voice options, with a faster version planned soon.
Broader availability beyond Azure
One detail worth noting: MAI-Thinking-1 and MAI-Code-1-Flash are available through Fireworks AI, Baseten, and OpenRouter, not just Azure. That choice is deliberate. These platforms collectively serve developers who specifically want to avoid tying their stack to a single cloud vendor. Making MAI models available there signals that Microsoft wants adoption beyond its own ecosystem, and that the models need to compete on merit.
What Microsoft is actually building here
This is not a single product launch. Microsoft is assembling a full model portfolio across reasoning, code, image, speech recognition, and voice output. These are the actual modalities that power AI inside software. Taken together, the MAI family gives Microsoft the ability to control its own model stack end-to-end, rather than depending on any external provider to fill capability gaps.
Satya Nadella framed it as a shift from “consuming a frontier model to fully participating at the frontier.” The practical translation: Microsoft’s AI product roadmap no longer has to wait for what OpenAI ships next.
For developers and enterprises, the direct consequence is more competition at the model layer. More competing models generally means lower inference costs and less pressure to commit to a single provider. That is a useful development, regardless of which models you end up using.