🤝 Google Creates Playable Worlds From Images
Dear curious minds,
Google’s intention behind rebranding their chatbot from Bard to Gemini was done to get a fresh and better start, but ended in a PR disaster. Gemini refuses to generate pictures of white people, even if this would be historical correct or a fact for known personalities. Furthermore, Gemini learned or was instructed to avoid some other socially controversial topics, like promoting meat or fossil fuels. At the same time, I realized that for my use cases Gemini Ultra often works excellent, and it is a true competitor for my GPT-4 usage. In the end, everyone has to decide for himself whether a new tool is helpful or not. What do you want from your chatbot? Are you fine if a chatbot is not perfect in all aspects?
This week’s issue brings you the following topics:
Mistral's New Flagship Model: Competitive But Not Open
Google DeepMind's Genie Transforms Images into Playable Games
Tech Term: World Model
If nothing sparks your interest, feel free to move on, otherwise, let us dive in!
📊🔒 Mistral's New Flagship Model: Competitive But Not Open
Mistral AI introduces Mistral Large, its most advanced language model, offering top-tier reasoning and multilingual capabilities.
The model is accessible on their new ChatGPT-like site called Le Chat, via API named La Plateforme, and, thanks to a new partnership with Microsoft, on Azure.
Mistral Large excels in complex tasks like text understanding, transformation, and code generation, achieving high rankings in industry benchmarks, second only to GPT-4.
The model is fluent in English, French, German, Spanish and Italian.
Its 32K token context window will be enough for many tasks, but is far away from the 128K token in GPT-4 or the in beta testing 1M token in Gemini 1.5 Pro, covered in a past issue.
Alongside Mistral Large, Mistral AI releases Mistral Small, optimized for low latency workloads, providing a balance between performance and cost.
Mistral Large and Mistral Small introduce native function calling, which allows developers to use their own tools and APIs with the LLM.
Furthermore, the models bring a JSON format output mode for structured data interactions by developers.
My take: It's nice to see a European company creating a competitive foundation model. However, the pivot away from their initial commitment to open-source releases highlights the pressure of their investors to monetize. Mistral's trajectory mirrors that of OpenAI – underscoring that even with the best intentions, the needed funding to create competitive models can reshape a company's approach to AI development.
🧞♀️🎮 Google DeepMind's Genie Transforms Images into Playable Games
Google DeepMind introduced Genie, an AI capable of generating playable 2D platformer games from simple image or text prompts.
The corresponding paper explains the theory behind Genie in more detail and states that the results shown are achieved with a model of not more than 11B parameters.
Genie was trained on over 200,000 hours of gameplay videos of 2D platform games without any additional label information.
The approach generalizes well and can be prompted with real-world photographs and sketches which were not part of the training material.
With all of the above, Genie is essentially a foundation world model (explained in the Tech Term section of this issue).
Furthermore, a different Genie model was trained on videos from a robotic arm. It learned to perform consistent actions and even how objects the arm interacts with deform.
Google DeepMind decided to release neither the models nor the training data. Only a minimal example to reproduce and prove the core concept is made public.
My take: Likely it will soon be possible to combine the capabilities of Genie with the video generation of OpenAI’s Sora. By doing this, we will have the ability to create new, impressive looking games on the fly and completely disrupt the gaming industry. Google has with YouTube exclusive access to countless more hours of training material and will be able to scale this approach even further. From my perspective, this is an important puzzle piece to train robots and at the same time another step in the direction of realizing AGI.
💻🔍 Tech Term: World Model
A world model refers to a system or framework that an AI uses to simulate or understand the environment it is interacting with. Essentially, it's a representation of the AI's understanding of its surroundings, which can include physical laws, objects, agents, and the relationships between them. This model allows the AI to predict future states of the environment based on its current state and the actions it might take. Here's a breakdown:
Representation of the Environment: The world model captures essential features of the environment, such as objects, agents, and their properties. It abstracts the complexity of the real world into a more manageable form for the AI to process.
Predictive Capability: A crucial function of a world model is to predict the outcomes of various actions within the environment. This predictive capability enables the AI to plan and make decisions by anticipating the consequences of its actions.
Learning and Adaptation: World models often improve over time through learning. As the AI interacts more with its environment, it collects data that it uses to refine its model, making its predictions more accurate and its decisions more effective.
Decision-Making: The model aids in decision-making processes by providing a simulated outcome for different actions. The AI can evaluate these simulated outcomes to choose the action that aligns best with its goals.
Generalization: Good world models can generalize from experiences to new, unseen environments. This ability is particularly important for developing AI that can operate in a wide range of settings without needing to be retrained for each new situation.
If you are looking for a more technical explanation, Yann LeCun got you covered in this 𝕏 post.
In gaming and simulations, world models play a crucial role in creating dynamic, responsive environments. When used in robotics, they help robots navigate and interact with their surroundings. In broader AI research, developing sophisticated world models is seen as a step towards achieving artificial general intelligence (AGI), where AI can understand and operate within the world as effectively as humans.
Disclaimer: This newsletter is written with the aid of AI. I use AI as an assistant to generate and optimize the text. However, the amount of AI used varies depending on the topic and the content. I always curate and edit the text myself to ensure quality and accuracy. The opinions and views expressed in this newsletter are my own and do not necessarily reflect those of the sources or the AI models.