World models — AI algorithms capable of generating a simulated environment in real-time — represent one of the more impressive applications of machine learning. In the last year, there’s been a lot of movement in the field, and to that end, Google DeepMind announced Genie 2 on Wednesday. Where its predecessor was limited to generating 2D worlds, the new model can create 3D ones and sustain them for significantly longer.
Genie 2 isn’t a game engine; instead, it’s a diffusion model that generates images as the player (either a human being or another AI agent) moves through the world the software is simulating. As it generates frames, Genie 2 can infer ideas about the environment, giving it the capability to model water, smoke and physics effects — though some of those interactions can be very gamey. The model is also not limited to rendering scenes from a third-person perspective, it can also handle first-person and isometric viewpoints. All it needs to start is a single image prompt, provided either by Google’s own Imagen 3 model or a picture of something from the real world.
Introducing Genie 2: our AI model that can create an endless variety of playable 3D worlds – all from a single image. 🖼️
These types of large-scale foundation world models could enable future agents to be trained and evaluated in an endless number of virtual environments. →… pic.twitter.com/qHCT6jqb1W
— Google DeepMind (@GoogleDeepMind) December 4, 2024
Notably, Genie 2 can remember parts of a simulated scene even after they leave the player’s field of view and can accurately reconstruct those elements once they become visible again. That’s in contrast to other world models like Oasis, which, at least in the version Decart showed to the public in October, had trouble remembering the layout of the Minecraft levels it was generating in real time.
However, there are even limitations to what Genie 2 can do in this regard. DeepMind says the model can generate “consistent” worlds for up to 60 seconds, with the majority of the examples the company shared on Wednesday running for significantly less time; in this case, most of the videos are about 10 to 20 seconds long. Moreover, artifacts are introduced and image quality softens the longer Genie 2 needs to maintain the illusion of a consistent world.
DeepMind didn’t detail how it trained Genie 2 other than to state it relied “on a large-scale video dataset.” Don’t expect DeepMind to release Genie 2 to the public anytime soon, either. For the moment, the company primarily sees the model as a tool for training and evaluating other AI agents, including its own SIMA algorithm, and something artists and designers could use to prototype and try out ideas rapidly. In the future, DeepMind suggests world models like Genie 2 are likely to play an important part on the road to artificial general intelligence.
“Training more general embodied agents has been traditionally bottlenecked by the availability of sufficiently rich and diverse training environments,” DeepMind said. “As we show, Genie 2 could enable future agents to be trained and evaluated in a limitless curriculum of novel worlds.”
Recent Comments