top | item 45596515

(no title)

zone411 | 4 months ago

I've benchmarked it on the Extended NYT Connections (https://github.com/lechmazur/nyt-connections/). It scores 20.0 compared to 10.0 for Haiku 3.5, 19.2 for Sonnet 3.7, 26.6 for Sonnet 4.0, and 46.1 for Sonnet 4.5.

discuss

order

whatreason|4 months ago

This is such a cool benchmark idea, love it

Do you have any other cool benchmarks you like? Especially any related to tools

shangofox|4 months ago

You could try wordle on it. But from my own experience all of them are pretty bad. They're not smart enough to pick up the colours represented as letters. The only one that actually was good was Qwen surprisingly.