top | item 45770879

(no title)

PxldLtd | 4 months ago

I think a good test of this seems to be to provide an image and get the model to predict what will happen next/if x occurs. They fail spectacularly at Rube-Goldberg machines. I think developing some sort of dedicated prediction model would help massively in extrapolating data. The human subconscious is filled with all sorts of parabolic prediction, gravity, momentum and various other fast-thinking paths that embed these calculations.

discuss

yanis_t|4 months ago

Any example of that? One would think that predicting what comes next from an image is basically video generation, which works not perfect, but works somehow (Veo/Sora/Grok)

PxldLtd|4 months ago

Here's one I made in Veo3.1 since gemini is the only premium AI I have access to.

Using this image - https://www.whimsicalwidgets.com/wp-content/uploads/2023/07/... and the prompt: "Generate a video demonstrating what will happen when a ball rolls down the top left ramp in this scene."

You'll see it struggles - https://streamable.com/5doxh2 , which is often the case with video gen. You have to describe carefully and orchestrate natural feeling motion and interactions.

You're welcome to try with any other models but I suspect very similar results.

mannykannot|4 months ago

It is video generation, but succeeding at this task involves detailed reasoning about cause and effect to construct chains of events, and may not be something that can be readily completed by applying "intuitions" gained from "watching" lots of typical movies, where most of the events are stereotypical.

pfortuny|4 months ago

Most amazing is asking any of the models to draw an 11-sided polygon and number the edges.

Torkel|4 months ago

I asked gpt5, and it worked really well with a correct result. Did you expect it to fail?