(no title)
dingocat | 2 years ago
The biggest one is that, well... The test doesn't aim to see what GPT-4 can do and how well it does it, only whether the participant can guess the (possibly cherry-picked) answer the author decided on. In short, we don't know if he sampled answers and decided on the most probable answer (akin to consensus voting/self-consistency[1]), or if he asked a question and chose the first one.
Maybe GPT-4 guesses the correct answer for a question 80% of the time, but he got unlucky? You don't know, the author doesn't tell you. The answers are generated ahead of time and are the same every time you go through the test.
PaulDavisThe1st|2 years ago
The questions mostly have correct or incorrect answers, and where there is some leeway, the author provides a fairly detailed explanation of what they would consider correct in each case. Do you have some specific criticism of an answer that you believe the author gets wrong?
thomasahle|2 years ago
My understanding is that the quiz samples a new GPT-4 answer every time you use it. That's why you put a confidence rather than a 0%/100% answer. There's always a chance it'll fail by freak accident.
Sophira|2 years ago
Also, the commentary on the answers refers to specific parts of the answers. For it to be as in-depth as it is, it would have to be either pre-written or the commentary also generated by GPT on the fly. (And of course it wouldn't make sense to do that given the nature of the quiz.)
[0] https://nicholas.carlini.com/writing/llm-forecast/static/que...