top | item 42968175 (no title) throw83288 | 1 year ago Apparently OpenAI's Deep Research already saturated a quarter of this benchmark, more or less a month in. But I also imagine it makes baffling mistakes anyway."Humanity's Laster Exam" coming up when? discuss order hn newest No comments yet.
No comments yet.