top | item 47053713

(no title)

zone411 | 12 days ago

They're improved compared to 4.5 on my Extended NYT Connections benchmark (https://github.com/lechmazur/nyt-connections/).

Sonnet 4.6 Thinking 16K scores 57.6 on the Extended NYT Connections Benchmark. Sonnet 4.5 Thinking 16K scored 49.3.

Sonnet 4.6 No Reasoning scores 55.2. Sonnet 4.5 No Reasoning scored 47.4.

discuss

order

rmi_|12 days ago

Thanks! I really like your benchmark.

Why is GLM-5 x's, though?