Google took yet another big step in the artificial intelligence (AI) race on Wednesday after it unveiled Gemini, it’s largest and most capable AI model, which comes with sophisticated multimodal reasoning capabilities and more safety at its core. Here’s a closer look.
Google says Gemini is the most capable and general model it has ever built. According to Demis Hassabis, CEO and co-founder of Google DeepMind, Gemini is the result of large-scale collaborative efforts by teams across Google, including Google Research. “It was built from the ground up to be multimodal, which means it can generalize and seamlessly understand, operate across and combine different types of information including text, code, audio, image and video,” Hassabis said in a note on the Google Keyword blog.
The company also said that Gemini is an extremely flexible AI model: it can efficiently run on everything from data centers to mobile devices. Its state-of-the-art capabilities will significantly enhance the way developers and enterprise customers build and scale with AI, Google said.
Gemini 1.0, the first version, has been optimized for three different sizes: Gemini Ultra (the largest and most capable model for highly complex tasks), Gemini Pro (Google’s best model for scaling across a wide range of tasks) and Gemini Nano, Google’s most efficient model for on-device tasks.
“Our first version, Gemini 1.0, is optimized for different sizes: Ultra, Pro and Nano. These are the first models of the Gemini era and the first realization of the vision we had when we formed Google DeepMind earlier this year. This new era of models represents one of the biggest science and engineering efforts we’ve undertaken as a company,” Google and Alphabet CEO Sundar Pichai said in a note on 6 December.
Google says Gemini 1.0’s sophisticated multimodal reasoning capabilities can help make sense of complex written and visual information. “This makes it uniquely skilled at uncovering knowledge that can be difficult to discern amid vast amounts of data,” the company said, adding: “Gemini 1.0 was trained to recognize and understand text, images, audio and more at the same time, so it better understands nuanced information and can answer questions relating to complicated topics. This makes it especially good at explaining reasoning in complex subjects like math and physics.”
Misuse and bias in AI have been huge topics of contention not only among users but also regulators and critics. Google is betting big on Gemini here. The company said in its blogpost that Gemini has “the most comprehensive safety evaluations of any Google AI model to date, including for bias and toxicity.” Google has conducted novel research into potential risk areas like cyber-offense, persuasion and autonomy, and has applied Google Research’s best-in-class adversarial testing techniques to help identify critical safety issues in advance of Gemini’s deployment, the statement explains.
Gemini is already rolling out in Google products. For instance, Bard – Google's conversational generative AI chatbot – will use a fine-tuned version of Gemini Pro for more advanced reasoning, planning, understanding and more. This is the biggest upgrade to Bard since it launched, Google said in its statement.
Gemini will also be coming to the company’s Pixel line of smartphones. Pixel 8 Pro is the first smartphone engineered to run Gemini Nano, which is powering new features like ‘Summarize’ in the Recorder app and rolling out in ‘Smart Reply’ in Gboard, starting with WhatsApp — with more messaging apps coming next year, Google said.
In the coming months, Gemini will also be available in more products and services like Search, Ads, Chrome and Duet AI.
“We’re already starting to experiment with Gemini in Search, where it's making our Search Generative Experience (SGE) faster for users, with a 40% reduction in latency in English in the US, alongside improvements in quality," the company said in the blogpost.
Also read: AI and its carbon footprint: How much water does ChatGPT consume?