Gemini 3.5 Flash: Google's fastest model is now its smartest too

Google announced Gemini 3.5 Flash at Google I/O 2026 on May 19, and made it available the same day. That means it is already the default model powering the Gemini app and AI Mode in Google Search for billions of people worldwide.

The headline here is genuinely unusual: a Flash-tier model has beaten the previous Pro-tier model on coding and agentic benchmarks. Flash models used to be the budget option you reached for when speed mattered more than capability. Gemini 3.5 Flash changes that positioning entirely.

What has actually changed

Gemini 3.5 Flash outperforms Google’s own Gemini 3.1 Pro on challenging coding and agentic benchmarks, including Terminal-Bench 2.1 (76.2%), GDPval-AA (1656 Elo), and MCP Atlas (83.6%). It also leads on multimodal understanding, scoring 84.2% on CharXiv Reasoning.

It does this at four times the output speed of comparable frontier models, with a one-million-token context window, 65k max output tokens, and support for text, image, audio, and video inputs. Dynamic thinking is on by default, and built-in tool use covers function calling, structured output, search-as-a-tool, and code execution.

Pricing sits at $1.50 per million input tokens and $9.00 per million output tokens, with a 90% discount on cached input tokens at $0.15 per million. That works out to roughly 25% cheaper than Gemini 3.1 Pro. Google’s pitch to enterprise teams is direct: if you shifted 80% of your workloads from other frontier models to 3.5 Flash, Google says you would save over $1 billion annually.

One thing to note on the API side: the thinking_budget parameter has been replaced with thinking_level, with settings of minimal, low, medium, and high. Default effort has also changed from high to medium, so if you are migrating from Gemini 3 Flash, verify that quality and cost still meet your expectations.

The bigger shift: from chatbot to agent

Google’s framing for this release is “frontier intelligence with action,” and that framing tells you where they think AI is heading. The emphasis is not on answering questions more accurately. It is on taking tasks off your plate entirely.

Gemini 3.5 Flash was co-developed with Google Antigravity, Google’s agent development platform, so that agents have what Google’s VP of Research Koray Kavukcuoglu described as “a native environment where they can live, work, and execute.” A Google I/O demo showed Antigravity running 3.5 Flash with 93 parallel subagents, over 15,000 requests, across 12 hours of wall-clock time, for under $1,000 total cost. That is a meaningful data point for anyone building complex automated pipelines.

For developers, Google also launched Managed Agents in the Gemini API in public preview. These let you build and deploy autonomous, stateful agents running in secure, isolated Google-hosted Linux sandbox environments.

What this means for you

If you use the Gemini app or Google Search, you are already on 3.5 Flash. The most visible new capability tied to this model is Gemini Spark, a personal AI agent that runs 24/7, integrates with Gmail, Docs, Slides, and other Workspace tools, and keeps working in the background after you close your laptop. Google is rolling Spark out to trusted testers first, with a beta planned for Google AI Ultra subscribers in the US shortly after I/O.

In Search, Google is using 3.5 Flash’s agentic capabilities to build custom generative UI responses on the fly, including visual tools and simulations tailored to specific queries. That is rolling out to everyone in Search this summer, at no extra cost.

If you are a developer, the combination of frontier-level performance, Flash-tier pricing, and native agent infrastructure makes 3.5 Flash worth a proper evaluation. The 93-subagent Antigravity demo is not just a showpiece; it illustrates the kind of parallel, long-running workloads the model was designed for. The API is available now in Google AI Studio.

If you work in enterprise, real-world deployments are already in production. Salesforce is integrating 3.5 Flash into Agentforce for complex enterprise task automation. Ramp is using it for multimodal OCR on invoices. Xero is running multi-week autonomous workflows for 1099 tax form processing. Databricks is applying it to real-time data monitoring and diagnostics. These are not pilots; they are live.

A note on what is still coming

Gemini 3.5 Pro was not available at I/O. Sundar Pichai announced it is coming next month, which reportedly drew audible groans from the audience. When it does arrive, the plan is for Pro and Flash to work together: as Google’s head of product Tulsee Doshi told TechCrunch, “3.5 Pro becomes your orchestrator, your planner, and then it actually can leverage Flash to be the various sub-agents.” That architecture makes the strong performance of Flash at this price point even more significant.

One limitation worth flagging: Computer Use is not supported in Gemini 3.5 Flash yet. If your workloads depend on that capability, you will need to stay on Gemini 3 Flash Preview for now.

On safety

Google says Gemini 3.5 Flash was developed under its Frontier Safety Framework, with strengthened cyber and CBRN safeguards and new interpretability tools that check the model’s reasoning before it responds. Google’s own assessment concluded that the model does not reach any Critical Capability Levels for frontier safety risks.

That said, making powerful autonomous agents broadly available to consumers does raise legitimate questions. Google is currently facing legal scrutiny over separate incidents involving Gemini, and the stakes only grow as AI moves from answering questions to taking actions on your behalf. It is worth keeping that context in mind as these tools become part of everyday workflows.