Google to Merge Gemini and Veo AI Models for Enhanced Understanding

Google to Merge Gemini and Veo AI Models for Enhanced Understanding

In a recent episode of the podcast Possible, co-hosted by Reid Hoffman, CEO of Google DeepMind Demis Hassabis dropped some tantalizing hints about their plans. His goal is to combine the company’s new Gemini artificial intelligence models with the firm’s cutting-edge Veo video-generating technology. This latest strategic acquisition is designed to expand Gemini’s ability to bring the physical world to life, expanding Gemini’s capabilities and effectiveness as a whole.

The rationale to combine these models comes from Google’s desire to build an overall more capable and adaptable digital assistant. Hassabis emphasized that “We’ve always built Gemini, our foundation model, to be multimodal from the beginning.” This multimodal approach is what makes it possible for Gemini to read and create different types of content – from audio to images to text. We think the addition of Veo—which primarily uses video data collected from YouTube, by the way—promises to really enhance this ability.

Veo 2’s training is very much based on the inspection of hundreds of thousands of YouTube clips. Hassabis explained, “Basically, by watching YouTube videos — a lot of YouTube videos — [Veo 2] can figure out, you know, the physics of the world.” The combined powers of Gemini and Veo would result in a much more advanced AI system. This new AI, powered by generative AI, will be able to understand real-world contexts more effectively.

YouTube, being a subsidiary of Google, plays a significant role in providing the necessary data for training these AI models. The company touts a comprehensive deal it has with YouTube creators to use their content in AI development. This partnership means that Google’s models are trained on a deeper state of the art video dataset repository, allowing them to learn more effectively.

Moreover, Google’s efforts extend beyond just video. The newest versions of the Gemini models are multimodal, enabling them to create audio as well as images and text. This versatility gives Google a real competitive advantage over other big players in the AI space. For example, OpenAI’s ChatGPT model is capable of natively generating images, such as stylized artwork in the fashion of Studio Ghibli movies.

Last year, Google broadened its terms of service in an early move in this ongoing strategy. This decision afforded the company much wider access to copyrighted data for training its artificial intelligence models. This significant transition signals the firm’s dedication to continuously iterating on its technologies and improving user experiences.

Tags

Leave a Reply

Your email address will not be published. Required fields are marked *