Google’s ‘Genie 3’ Interactive Generative Video Model Takes Us One Step Closer to the Holodeck

DeepMind, Google’s AI research lab, announced the release of Genie 3, a new AI system capable of generating interactive virtual environments in real-time—and bringing us one step closer to the Holodeck.

Google says in a DeepMind update that with a simple text prompt, Genie 3 can create dynamic, navigable scenes that run at 24 frames-per-second in 720p resolution.

Granted, Genie 3 can be only be used on flatscreen monitors, so there’s no telling when we’ll get something similar for VR headsets. For example, Quest 3’s display has a per-eye resolution of 2,064 × 2,208, clocked at a base refresh rate of 90Hz, putting VR on the far end of the performance fringe (as usual).

It’s undoubtedly prescient look at things to come though. Unlike static or pre-rendered simulations, Google says the model generates each frame on the fly, allowing for quicker user interaction and environmental feedback.

What’s more, these generated worlds can remain visually and physically consistent for several minutes, Google says, with the system retaining a form of short-term memory to reflect past actions.

Genie 3 is also capable of simulating a wide range of scenarios, including natural environments, historical settings, and both fictional and animated worlds. Meanwhile, users can trigger “promptable world events,” where users can insert in-world changes via text commands, like altering the weather or introducing new objects.

Beyond the fun of recreating 1800’s Osaka, or making a jet ski appear in the canals of Amsterdam, Google says Genie 3 will also be a tool for embodied AI training, with potential applications in fields like robotics, gaming, and artificial general intelligence research.

For now, there are a few limitations. Google says Genie 3 currently has a limited “action space” for agents, and struggles with accurately modeling multi-agent interactions in shared environments. By “agents,” the company’s referring to AI systems that operate autonomously within the virtual environments, in a way making decisions, taking actions, and learning from experience.

It also faces challenges with simulating real-world locations with “perfect geographic accuracy”, rendering text clearly, and maintaining long-duration interactions beyond a few minutes.

Still, it’s a pretty amazing leap from the sort of non-interactive videos we’re seeing online now, many of which are pretty difficult to tell from the real deal. Will Smith spaghetti-eating simulations are only going to get more lifelike and, with systems like Genie 3, interactive too.


Source link

Leave a Reply

Your email address will not be published. Required fields are marked *