top | item 46992738 (no title) riku_iki | 17 days ago > If gemini-3-deepthink gets above 85% on the private eval set, it will be considered "solved"They never will do on private set, because it would mean its being leaked to google. discuss order hn newest No comments yet.
No comments yet.