top | item 41824833

(no title)

fourgreen | 1 year ago

The benchmark itself is interesting, but we should notice that the authors didn't test it on claude 3.5. From my experience, for tasks which need intelligence, claude 3.5 is much better than gpt or llama (which authors used). So the conclusion is not than "LLM-based AI models are flawed because they cannot reason" but rather at most "gpt o1-min and Llama3-8B are flawed because they cannot reason". For instance, authors do this test:

""" We add seemingly relevant statements to the questions that are, in fact, irrelevant to the reasoning and conclusion. However, the majority of models fail to ignore these statements and blindly convert them into operations, leading to mistakes prompt: Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?

o1-mini response: Oliver picks kiwis over three days as follows:

- Friday: 44 kiwis

- Saturday: 58 kiwis

- Sunday: Double the number he picked on Friday, which is 2 × 44 = 88 kiwis

However, on Sunday, 5 of these kiwis were smaller than average. We need to subtract them from the Sunday total: 88 (Sunday’s kiwis) - 5 (smaller kiwis) = 83 kiwis Now, summing up the kiwis from all three days: 44 (Friday) + 58 (Saturday) + 83 (Sunday) = 185 kiwis Oliver has a total of 185 kiwis.

"""

I tried asking exactly the same question to claude 3.5 and the response was:

"""

Let's break this problem down step by step:

1. Friday's kiwis:

   Oliver picked 44 kiwis on Friday.
2. Saturday's kiwis:

   Oliver picked 58 kiwis on Saturday.
3. Sunday's kiwis:

   Oliver picked double the number of kiwis he did on Friday.

   Friday's kiwis = 44

   Double of Friday's kiwis = 44 × 2 = 88 kiwis

   However, 5 of these were smaller than average, but they still count in the total.
4. Total kiwis:

   Friday's kiwis + Saturday's kiwis + Sunday's kiwis

   = 44 + 58 + 88

   = 190 kiwis
Therefore, Oliver has a total of 190 kiwis.

"""

discuss

order

No comments yet.