top | item 41523127

(no title)

farresito | 1 year ago

Damn, that looks like a big jump.

discuss

order

deisteve|1 year ago

so o1 seems like it has real measurable edge, crushing it in every single metric, i mean 1673 elo is insane, and 89th percentile is like a whole different league, and it looks like it's not just a one off either, it's consistently performing way better than gpt-4o across all the datasets, even in the ones where gpt-4o was already doing pretty well, like math and mmlu, o1 is just taking it to the next level, and the fact that it's not even showing up in some of the metrics, like mmmu and mathvista, just makes it look even more impressive, i mean what's going on with gpt-4o, is it just a total dud or what, and btw what's the deal with the preview model, is that like a beta version or something, and how does it compare to o1, is it like a stepping stone to o1 or something, and btw has anyone tried to dig into the actual performance of o1, like what's it doing differently, is it just a matter of more training data or is there something more going on, and btw what's the plan for o1, is it going to be released to the public or is it just going to be some internal tool or something

farresito|1 year ago

> like what's it doing differently, is it just a matter of more training data or is there something more going on

Well, the model doesn't start with "GPT", so maybe they have come up with something better.

spaceman_2020|1 year ago

1673 ELO is wild

If its actually true in practice, I sincerely cannot imagine a scenario where it would be cheaper to hire actual junior or mid-tier developers (keyword: "developers", not architects or engineers).

1,673 ELO should be able to build very complex, scalable apps with some guidance