top | item 47021157

(no title)

Yeah, spatial reasoning has been a weak spot for LLMs. I’m actually building a new code exercise for my company right now where the candidate is allowed to use any AI they want, but it involves spatial reasoning. I ran Opus 4.6 and Codex 5.3 (xhigh) on it and both came back with passable answers, but I was able to double the score doing it by hand.

It’ll be interesting to see what happens if a candidate ever shows up and wants to use Deep Think. Might blow right through my exercise.

discuss

No comments yet.