(no title)
KerrickStaley | 7 days ago
The simulator lets the LLM request renders from different angles/times, so the LLM can get visual feedback. For failures, the simulator also returns status codes like `object_fell` or `mount_initially_collided_with_object` depending on what happened. You can see what the tool call looks like by looking at the Transcript tab, e.g. here https://kerrickstaley.com/ai-cad-design-mount-viz/gso__mug__...
I agree it's not clear how much benefit models get from iteration. Many of the successful runs are one-shots. You can see some examples of basic spatial reasoning e.g. here https://kerrickstaley.com/ai-cad-design-mount-viz/gso__mug__... :
> The initial collision is because the mount was positioned at the same height as the mug's body center (z=-22), causing overlap. I need to lower the mount significantly so the mug starts above it and drops into the cradle.
__atx__|7 days ago
Ah yes, that matches my observations. It kinda sees that the stuff it is looking for is there, but does not see enough detail to actually notice that not only there is an endcap in the way, but the mug is also rotated the wrong way to sit in the holder.
It feels like the "r's in strawberry" effect where the models do not have enough introspection into the raw input data.