(no title)
ollin | 9 months ago
This would explain:
1. How collisions / teleportation work and why they're so rigid (the WM is mimicking hand-implemented scene-bounds logic)
2. Why the scenes are static and, in the case of should-be-dynamic elements like water/people/candles, blurred (the WM is mimicking artifacts from the 3D representation)
3. Why they are confident that "There's no map or explicit 3D representation in the outputs. This is a diffusion model, and video in/out" https://x.com/olivercameron/status/1927852361579647398 (the final product is indeed a diffusion WM trained on videos, they just have a complicated pipeline for getting those training videos)
No comments yet.