(no title)
montebicyclelo | 1 month ago
I don't have access to the DeepMind demo, but from the video it looks like it takes the idea up a notch.
(I don't know the exact lineage of these ideas, but a general observation is that it's a shame that it's the norm for blog posts / indie demos to not get cited.)
[1] https://news.ycombinator.com/item?id=43798757
[2] https://madebyoll.in/posts/world_emulation_via_dnn/demo/
ollin|1 month ago
- That forest trail world is ~5 million parameters, trained on 15 minutes of video, scoped to run on a five-year-old iPhone through a twenty-year old API (WebGL GPGPU, i.e OpenGL fragment shaders). It's the smallest '3D' world model I'm aware of.
- Genie 3 is (most likely) ~100 billion parameters trained on millions of hours of video and running across multiple TPUs. I would be shocked if it's not the largest-scale world model available to the public.
There are lots of neat intermediate-scale world models being developed as well (e.g. LingBot-World https://github.com/robbyant/lingbot-world, Waypoint 1 https://huggingface.co/blog/waypoint-1) so I expect we'll be able to play something of Genie quality locally on gaming GPUs within a year or two.
danielwmayer|1 month ago
ollin|1 month ago
Regarding the specific boiling-textures effect: there's a tradeoff in recurrent world models between jittering (constantly regenerating fine details to avoid accumulating error) and drifting (propagating fine details as-is, even when that leads to accumulating error and a simplified/oversaturated/implausible result). The forest trail world is tuned way towards jittering (you can pause with `p` and step frame-by-frame with `.` to see this). So if the effect resembles LSD, it's possible that LSD applies some similar random jitter/perturbation to the neurons within your visual cortex.
unknown|1 month ago
[deleted]