top | item 44615864

(no title)

blendergeek | 7 months ago

discuss

raincole|7 months ago

Note that it's two different things:

This OP claims the publicly available models all failed to get Bronze.

OpenAI tweet claims there is an unreleased model that can get Gold.

sigmoid10|7 months ago

I'd also be highly wary of the method they used because of statements like this:

>we note that the vast majority of its answers simply stated the final answer without additional justification

While the reasoning steps are obviously important for judging human participant answers, none of the current big-game providers disclose their actual reasoning tokens. So unless they got direct internal access to these models from the big companies (which seems highly unlikely), this might be yet another failed study designed to (of which we have seen several in recent months, even by serious parties).

dmitrygr|7 months ago

My (unreleased) cat did even better than the OpenAI model. No you cannot see. Yes you have to trust me. Now gimme more money.

bgwalter|7 months ago

The model did not fit in the margin.

We'll never know how many GPUs and other assistance (like custom code paths) this model got.

untitled2|7 months ago

Exactly. Whom to believe?

JohnKemeny|7 months ago

The last time someone claimed a medal in an olympiad like this, turned out they manually translated the problem into Lean and then ran a brute force search algorithm to find a proof. For 60 hours. On a supercomputer.

Meanwhile high schoolers get a piece of paper and 4.5 hours.

changoplatanero|7 months ago

Both are true. One spent $400 in compute and the other one spent a lot more.

kenjackson|7 months ago

OpenAI achieved Gold on an unreleased model. GPT-5. Read the tweets and they explain what they did.