Google announces faster, more efficient Gemini AI model

Google made waves with the release of Gemini 2.5 last month, rocketing to the top of the AI leaderboard after previously struggling to keep up with the likes of OpenAI. That first experimental model was just the beginning. Google is deploying its improved AI in more places across its ecosystem, from the developer-centric Vertex AI to the consumer Gemini app.

Gemini models have been dropping so quickly, it can be hard to grasp Google's intended lineup. Things are becoming clearer now that the company is beginning to move its products to the new branch. At the Google Cloud Next conference, it has announced initial availability of Gemini 2.5 Flash. This model is based on the same code as Gemini 2.5 Pro, but it's faster and cheaper to run.

You won't see Gemini 2.5 Flash in the Gemini app just yet—it's starting out in the Vertex AI development platform and AI Studio. The experimental wide release of Pro helped Google gather data and see how people interacted with the new model, and that has helped inform the development of 2.5 Flash.

The Flash versions of Gemini are smaller than the Pro versions, though Google doesn't like to talk about specific parameter counts. Flash models provide faster answers for simpler prompts, which has the side effect of reducing costs. We do know that 2.5 Pro (Experimental) was the first Gemini model to implement dynamic thinking, a technique that allows the model to modulate the amount of simulated reasoning that goes into an answer. 2.5 Flash is also a thinking model, but it's a bit more advanced.

We recently spoke with Google's Tulsee Doshi, who noted that the 2.5 Pro (Experimental) release was still prone to "overthinking" its responses to simple queries. However, the plan was to further improve dynamic thinking for the final release, and the team also hoped to give developers more control over the feature. That appears to be happening with Gemini 2.5 Flash, which includes "dynamic and controllable reasoning."

The newest Gemini models will choose a "thinking budget" based on the complexity of the prompt. This helps reduce wait times and processing for 2.5 Flash. Developers even get granular control over the budget to lower costs and speed things along where appropriate. Gemini 2.5 models are also getting supervised tuning and context caching for Vertex AI in the coming weeks.

In addition to the arrival of Gemini 2.5 Flash, the larger Pro model has picked up a new gig. Google's largest Gemini model is now powering its Deep Research tool, which was previously running Gemini 2.0 Pro. Deep Research lets you explore a topic in greater detail simply by entering a prompt. The agent then goes out into the Internet to collect data and synthesize a lengthy report.

Gemini vs. ChatGPT chart — Credit: Google

Google says that the move to Gemini 2.5 has boosted the accuracy and usefulness of Deep Research. The graphic above shows Google's alleged advantage compared to OpenAI's deep research tool. These stats are based on user evaluations (not synthetic benchmarks) and show a greater than 2-to-1 preference for Gemini 2.5 Pro reports.

Deep Research is available for limited use on non-paid accounts, but you won't get the latest model. Deep Research with 2.5 Pro is currently limited to Gemini Advanced subscribers. However, we expect before long that all models in the Gemini app will move to the 2.5 branch. With dynamic reasoning and new TPUs, Google could begin lowering the sky-high costs that have thus far made generative AI unprofitable.