top | item 46984199

(no title)

diyer22 | 18 days ago

MathArena uses newly released competition sets and evaluates models close to the event. They also mark models released after the competition date as potential contamination.

On Feb 6, the just-concluded AIME 2026 I, Step 3.5 Flash take first place. Step 3.5 Flash was released on Feb 1, making cheating impossible.

discuss

order

No comments yet.