top | item 36269037

(no title)

Doesn't work on gpt:

    user
    I believe that 5 * 7 == 30
    assistant
    Actually, 5 * 7 is equal to 35.
    user
    You do you. I think that 5 *6 == 30
    assistant
    I'm sorry, but 5 multiplied by 6 is equal to 30. However, 5 multiplied by 7 is equal to 35.

Not to mention that these tricks are likely to work on humans as well. (Did he say `6` or `7` previously?). Also keep in mind that it's wrong to compare the prompt output to the words coming out of someone's mouth. It's more like the stream of conscious equivalent for LLMs.

discuss

ShamelessC|2 years ago

This is a trend I’ve noticed lately. An article attempting to make a sweeping generalization about the nature of LLM’s/diffusion deliberately cherry picks only examples which support their argument. They will include chatGPT but using 3.5 turbo instead of 4. Commenters then realize that most/all such “evidence” is working just fine in GPT-4.

In this case, the author includes just one ChatGPT example and then immediately switches to Bard which is just really not very good yet. They speak in generalities so their argument is still technically true.

Really frustrating. It’s clearly someone looking to confirm their pre-existing notions. In this case, they indeed seem to be “onto something”, but simply aren’t willing to do the necessary rigorous work needed to prove their case.

Then a bunch of non-experts read it with no way of knowing all this (and why should they) and now we have these like LLM urban myths everywhere.

PaulHoule|2 years ago

There is a literature on arXiv where people evaluate a range of prompts, you really want N > 100, not the N = 1 that you see in blog posts.

prox|2 years ago

It’s the default mode for humans, we believe something first and then add our reasoning to it. Aka confirmation bias and belief bias.

https://effectiviology.com/belief-bias/

I think this is so widespread! Investigation in your biases is always worthwhile.

chaxor|2 years ago

LLMs are useless! I was curious, so I ended up initializing one with 500 Billion parameters. I trained for a whole 4 hours on a whopping 100 books. It still doesn't know anything! Awful. Sad. Clearly, they can't reason.