Every time I see a table like this numbers go up. Can someone explain what this actually means? Is there just an improvement that some tests are solved in a better way or is this a breakthrough and this model can do something that all others can not?
rvnx|3 months ago
The questions AND the answers are public.
If the LLM manages through reasoning OR memory to repeat back the answer then they win.
The scores represent the % of correct answers they recalled.
tylervigen|3 months ago
You could question how well this works, but it’s not like the answers are just hanging out on the public internet.
stavros|3 months ago