top | item 46976352

Open model StepFun-3.5 is #1 on MathArena, an uncheatable math benchmark

3 points| diyer22 | 19 days ago |twitter.com

2 comments

order

falcor84|19 days ago

How is "uncheatable"? If you know the exact olympiad questions it's being assessed on, what's stopping you from massaging it until it gets more of them right than the previous number 1?

diyer22|18 days ago

MathArena uses newly released competition sets and evaluates models close to the event. They also mark models released after the competition date as potential contamination.

On Feb 6, the just-concluded AIME 2026 I, Step 3.5 Flash take first place. Step 3.5 Flash was released on Feb 1, making cheating impossible.