top | item 38661000

(no title)

jafitc | 2 years ago

This "vibe" check that it's even better than GPT-4 Turbo is not what its Elo rating shows on the Chatbot Arena based on not 1 but thousands of user votes. GPT-4 (Turbo) is in a league of its own still.

discuss

order

npinsker|2 years ago

By its nature, that site isn't very representative of how the models perform in real-world use.

Reubend|2 years ago

That depends on what real world use you're targeting, but unfortunately I'm not aware of anything better than that leaderboard in terms of sample size and model coverage.

ssabev|2 years ago

The ELO leaderboard you mean?

Racing0461|2 years ago

The vibe check is for pro tho. I want to see how ultra is benchmarked.