top | item 46236325

(no title)

Donald | 2 months ago

Gemini 3 Pro Preview gets 96.8% on the same benchmark? That's impressive

discuss

And performs very well on the latest 100 puzzles too, so isn't just learning the data set (unless I guess they routinely index this repo).

I wonder how well AIs would do at bracket city. I tried gemini on it and was underwhelmed. It made a lot of terrible connections and often bled data from one level into the next.

wooger|2 months ago

> unless I guess they routinely index this repo

This sounds like exactly the kind of thing any tech company would do when confronted with a competitive benchmark.

capitainenemo|2 months ago

Belated update on this. Gemini reasoning did much better than quick on bracket city today (an easy puzzle but still). It only failed to solve one clue outright, got another wrong but due to ambiguity in the expression referenced and in a way that still fit the next level down making the final answer fairly cleanly solved. Still clearly has a harder time with it than the connections puzzle.

bigyabai|2 months ago

GPT-5.2 might be Google's best Gemini advertisement yet.

outside1234|2 months ago

Especially when you see the price