And performs very well on the latest 100 puzzles too, so isn't just learning the data set (unless I guess they routinely index this repo).
I wonder how well AIs would do at bracket city. I tried gemini on it and was underwhelmed. It made a lot of terrible connections and often bled data from one level into the next.
Belated update on this. Gemini reasoning did much better than quick on bracket city today (an easy puzzle but still). It only failed to solve one clue outright, got another wrong but due to ambiguity in the expression referenced and in a way that still fit the next level down making the final answer fairly cleanly solved. Still clearly has a harder time with it than the connections puzzle.
capitainenemo|2 months ago
I wonder how well AIs would do at bracket city. I tried gemini on it and was underwhelmed. It made a lot of terrible connections and often bled data from one level into the next.
wooger|2 months ago
This sounds like exactly the kind of thing any tech company would do when confronted with a competitive benchmark.
capitainenemo|2 months ago
bigyabai|2 months ago
outside1234|2 months ago