Sundar Pichai delivers the keynote address during the 2015 Google I/O conference in San Francisco, On Wednesday, Google introduced its highly anticipated general purpose, multimodal, generative AI model, Gemini, which the company claims is more powerful than OpenAI’s GPT-4. “Gemini can understand the world around us in the way that we do,” said Demis Hassabis, founder of DeepMind, Google’s elite AI lab that created the model, adding that Gemini is better than any other model out there. Demis Hassabis, CEO of Google’s DeepMind Technologies, which built Gemini 1.
0. (Photo by Toby Google claims Gemini has 5 times the computational power of GPT-4, leading to faster training and potentially larger model sizes. It said Gemini is the first model to outperform human experts on MMLU (Massive Multitask Language Understanding), one of the most popular methods to test the knowledge and problem solving abilities of AI models.
The model will be made available to developers through Google Cloud’s API from December 13, with a more powerful version set to debut in 2024 pending extensive trust and safety checks. Gemini, which comes in three sizes, can run efficiently on a range of platforms, from data centers to mobile devices and combines different types of information such as text, code, audio, image, and video. Google said Gemini Ultra excels at tasks involving deliberate reasoning, surpassing previous state-of-the-art models.
Furthermore, it excels at image benchmarks, demonstrating native multi-modality and complex reasoning abilities. The standard approach in creating multi-modal models involves training separate components for different modalities. However, Gemini was designed to be natively multi-modal, pre-trained on different modalities from the beginning.
This design allows Gemini to understand and reason about all kinds of inputs far better than existing multi-modal models. Gemini was trained to recognize and understand text, images, audio, and more simultaneously, which makes it proficient in explaining reasoning in complex subjects like math and physics. Gemini’s sophisticated multi-modal reasoning capabilities can help make sense of complex written and visual information.
It extracts insights from hundreds of thousands of documents, enabling breakthroughs at digital speeds in many fields from science to finance. Gemini can understand, explain, and generate high-quality code in the world’s most popular programming languages. Its ability to reason about complex information places it among the leading foundation models for coding globally.
Google trained Gemini on its AI-optimized infrastructure using Google’s in-house designed Tensor Processing Units (TPUs), making it less subject to shortages of the GPUs that GPT-4 and other models depend on. It designed Gemini to be its most reliable and scalable model to train, and its most efficient to serve. The company said it is adding new protections to account for Gemini’s multi-modal capabilities, considering potential risks at each stage of development.
Gemini is now rolling out across a range of products and platforms. For instance, Google’s chatbot, Bard, will use a fine-tuned version of Gemini Pro for more advanced reasoning, planning, understanding, and more. Generative AI is rapidly evolving, and the relative strengths of competing models may shift over time.
But one thing is certain: Google just upped the ante. .
From: forbes
URL: https://www.forbes.com/sites/craigsmith/2023/12/06/google-unveils-gemini-claiming-its-more-powerful-than-openais-gpt-4/