> Genie is capable of converting a variety of different prompts into interactive, playable environments that can be easily created, stepped into, and explored
If these are generating a fully interactive environments, why are all the clips ~1 second long?
Based on the first sentence in your paper, I would have expected a playable example as a demo. Or 20.
But reading a bit further into the paper, it sounds like the model needs to be actively running inference and will generate the next frame on the fly as actions are taken- is that correct?
Firstly, do these models learn a good physics grounding for nonsense actions? Like keep pressing down even when you are in the ground? Or will they phase you through the ground?
Secondly, why are all videos like half a second long? I thought video generation came much farther than this. My guess would be that the world models unravel at any length longer than that, which is (and has always been) the problem with models such as these. Minus the video generation part, we had pretty good world models for games already, see Dreamer line of work: https://danijar.com/project/dreamerv3/
Author here :) Re: 1) typically no, but of course it can hallucinate just like LLMs. 2) Agreed but the key point missing is Dreamer is trained from an RL environment with action labels. Genie is trained exclusively from videos and learns an action space. This is the first version of something that is now possible and will only improve with scale.
In the video, the character becomes a pixelated mess. In the static image, the character is clearly on rocks in the foreground, but in the "game" we see the character magically jumping from the foreground rocks to the background structure which also contains significant distortions.
The extremely short demo videos make it slightly harder to catch these obvious issues.
What is the video resolution, 64x64? And even then it becomes blurry. Seems like another Google flag-plant-y paper filled with hot air that we will never see the source code or model for because it will expose how poor its capabilities are relative to competitors.
The internal politics at these places must be exhausting. Industry research was supposed to be free from the publish or perish mindset, but it seems like it just got replaced by a different kind of need for posturing.
Seems very interesting, but as soon as I see "Google Research" or "Deepmind" now it's an instant turndown. Too much PR, not enough substance. Not targeting directly you guys with this research, but the company you work for.
Looking forward to following your progress. I've been wanting to see how we might replace polygons for gaming long term, this seems like a step in the right direction.
jasonjmcghee|2 years ago
If these are generating a fully interactive environments, why are all the clips ~1 second long?
Based on the first sentence in your paper, I would have expected a playable example as a demo. Or 20.
But reading a bit further into the paper, it sounds like the model needs to be actively running inference and will generate the next frame on the fly as actions are taken- is that correct?
jparkerholder|2 years ago
polygamous_bat|2 years ago
Secondly, why are all videos like half a second long? I thought video generation came much farther than this. My guess would be that the world models unravel at any length longer than that, which is (and has always been) the problem with models such as these. Minus the video generation part, we had pretty good world models for games already, see Dreamer line of work: https://danijar.com/project/dreamerv3/
jparkerholder|2 years ago
nycdatasci|2 years ago
In the video, the character becomes a pixelated mess. In the static image, the character is clearly on rocks in the foreground, but in the "game" we see the character magically jumping from the foreground rocks to the background structure which also contains significant distortions.
The extremely short demo videos make it slightly harder to catch these obvious issues.
polygamous_bat|2 years ago
The internal politics at these places must be exhausting. Industry research was supposed to be free from the publish or perish mindset, but it seems like it just got replaced by a different kind of need for posturing.
sqreept|2 years ago
snide|2 years ago
https://en.wikipedia.org/wiki/GEnie
mdrzn|2 years ago
joloooo|2 years ago