Google’s latest artificial intelligence update, Gemini 2.0, is capable of thinking and acting on your behalf, under your supervision.
With its sophisticated reasoning and extensive context capabilities, Deep Research—which is included with Gemini 2.0—can serve as a research assistant by delving into intricate subjects and creating reports for users. On Wednesday, it will be accessible in Gemini Advanced.
According to Sundar Pichai, Google’s and Alphabet’s chief executive officer, the tech giant has been investing in developing more agentic models over the past year. These models can understand more about the world around a user, think multiple steps ahead, and act on the user’s behalf with their supervision.
Google’s newly unveiled models, built for the agentic era, are primed to become universal assistants. “Introducing Gemini 2.0, our most capable model yet. With new advances in multimodality — like native image and audio output — and native tool use, it will enable us to build new AI agents that bring us closer to our vision of a universal assistant,” said Pichai.
Google noted that AI has transformed Search, and its AI Overviews now reach 1 billion people. It pointed out that it is bringing the advanced reasoning capabilities of Gemini 2.0 to AI Overviews to tackle more complex topics and multi-step questions, including advanced math equations, multimodal queries, and coding.
“If Gemini 1.0 was about organizing and understanding information, Gemini 2.0 is about making it much more useful,” added Pichai.
Gemini 2.0 Flash
Gemini 2.0 Flash builds on the success of 1.5 Flash, our most popular model yet for developers, with enhanced performance at similarly fast response times. Notably, 2.0 Flash even outperforms 1.5 Pro on key benchmarks, at twice the speed. 2.0 Flash also comes with new capabilities. In addition to supporting multimodal inputs like images, video and audio, 2.0 Flash now supports multimodal output like natively generated images mixed with text and steerable text-to-speech (TTS) multilingual audio. It can also natively call tools like Google Search, code execution as well as third-party user-defined functions.
Gemini 2.0 Flash is available now as an experimental model to developers via the Gemini API in Google AI Studio and Vertex AI with multimodal input and text output available to all developers, and text-to-speech and native image generation available to early-access partners. General availability will follow in January, along with more model sizes.
To help developers build dynamic and interactive applications, we’re also releasing a new Multimodal Live API that has real-time audio, video-streaming input and the ability to use multiple, combined tools. More information about 2.0 Flash and the Multimodal Live API can be found in our developer blog.
Gemini 2.0 available in Gemini app, our AI assistant
Also starting today, Gemini users globally can access a chat optimized version of 2.0 Flash experimental by selecting it in the model drop-down on desktop and mobile web and it will be available in the Gemini mobile app soon. With this new model, users can experience an even more helpful Gemini assistant.
Unlocking agentic experiences with Gemini 2.0
Gemini 2.0 Flash’s native user interface action-capabilities, along with other improvements like multimodal reasoning, long context understanding, complex instruction following and planning, compositional function-calling, native tool use and improved latency, all work in concert to enable a new class of agentic experiences.
The practical application of AI agents is a research area full of exciting possibilities. We’re exploring this new frontier with a series of prototypes that can help people accomplish tasks and get things done. These include an update to Project Astra, our research prototype exploring future capabilities of a universal AI assistant; the new Project Mariner, which explores the future of human-agent interaction, starting with your browser; and Jules, an AI-powered code agent that can help developers.
The release of Gemini 2.0 comes after the introduction of Google’s Willow, a novel quantum technology that enhances artificial intelligence (AI) capabilities while drastically reducing errors.