top | item 46831775

(no title)

in-silico | 1 month ago

Yes, it reproduces what it is given by modelling the rules of physics, geometry, etc.

For example, image generators like stable diffusion carry strong representations of depth and geometry, such that performant depth estimation models can be built out of them with minimal retraining. This continues to be true for video generation models.

Early work on the subject: https://arxiv.org/pdf/2409.09144

discuss

slashdave|25 days ago

What? No, it does no such thing. Study the architecture. Pixels in. Pixels out.