top | item 42968175

(no title)

throw83288 | 1 year ago

Apparently OpenAI's Deep Research already saturated a quarter of this benchmark, more or less a month in. But I also imagine it makes baffling mistakes anyway.

"Humanity's Laster Exam" coming up when?

discuss

order

No comments yet.