top | item 44839344

(no title)

surround | 6 months ago

> The betting markets were not impressed by GPT-5. I am reading this graph as "there is a high expectation that Google will announce Gemini-3 in August", and not as "Gemini 2.5 is better than GPT-5".

This is an incorrect interpretation. The benchmark which the betting market is based upon currently ranks Gemini 2.5 higher than GPT-5.

discuss

order

theahura|6 months ago

EDIT: I updated the article to account for this perspective.

------

This can't be right -- they're using LMArena without style control to resolve the market, and GPT-5 is ahead right? (https://lmarena.ai/leaderboard/text/overall-no-style-control)

> This market will resolve according to the company which owns the model which has the highest arena score based off the Chatbot Arena LLM Leaderboard (https://lmarena.ai/) when the table under the "Leaderboard" tab is checked on August 31, 2025, 12:00 PM ET.

> Results from the "Arena Score" section on the Leaderboard tab of https://lmarena.ai/leaderboard/text with the style control off will be used to resolve this market.

> If two models are tied for the top arena score at this market's check time, resolution will be based on whichever company's name, as it is described in this market group, comes first in alphabetical order (e.g. if both were tied, "Google" would resolve to "Yes", and "xAI" would resolve to "No")

> The resolution source for this market is the Chatbot Arena LLM Leaderboard found at https://lmarena.ai/. If this resolution source is unavailable at check time, this market will remain open until the leaderboard comes back online and resolve based on the first check after it becomes available. If it becomes permanently unavailable, this market will resolve based on another resolution source.

JimDabell|6 months ago

> This is an incorrect interpretation. The benchmark which the betting market is based upon currently ranks Gemini 2.5 higher than GPT-5.

You can see from the graph that Google shot way up from ~25% to ~80% upon the release of GPT-5. Google’s model didn’t suddenly get way better at any benchmarks, did it?

dcre|6 months ago

It's not about Google's model getting better. It is that gpt-5 already has a worse score than Gemini 2.5 Pro had before gpt-5 came out (on the particular metric that determines this bet: Overall Text without Style Control).

https://lmarena.ai/leaderboard/text/overall-no-style-control

That graph is a probability. The fact that it's not 100% reflects the possibility that gpt-5 or someone else will improve enough by the end of the month to beat Gemini.