I think a good test of this seems to be to provide an image and get the model to predict what will happen next/if x occurs. They fail spectacularly at Rube-Goldberg machines. I think developing some sort of dedicated prediction model would help massively in extrapolating data. The human subconscious is filled with all sorts of parabolic prediction, gravity, momentum and various other fast-thinking paths that embed these calculations.
yanis_t|4 months ago
PxldLtd|4 months ago
Using this image - https://www.whimsicalwidgets.com/wp-content/uploads/2023/07/... and the prompt: "Generate a video demonstrating what will happen when a ball rolls down the top left ramp in this scene."
You'll see it struggles - https://streamable.com/5doxh2 , which is often the case with video gen. You have to describe carefully and orchestrate natural feeling motion and interactions.
You're welcome to try with any other models but I suspect very similar results.
mannykannot|4 months ago
pfortuny|4 months ago
Torkel|4 months ago