What GPT-4 Does Is Less Like “Figuring Out” and More Like “Already Knowing”

[+] ftxbro|3 years ago|reply

> "GPT-4 Is Incredibly Smart and Quite Dumb"

Their evidence that it's 'quite dumb' consists of two prompts, one tricks it with a river crossing puzzle and another involves factorization.

So yeah it's not an omniscient oracle. But I feel like that's a super short-sighted take. It's already shown that it's imperfect but nearly superhuman on puzzles, and improving at a fast rate on each successive version. Does the author think GPT-5 won't be able to solve those ones? Of course even if it can, there will be some harder puzzles that GPT-5 won't be able to solve. Will that mean it's 'quite dumb' too?

Probably I'm thinking too much about it, and really the point of the blogpost is just to flex that they were able to fool the bot with two puzzles. So in that case well done!

[+] snewman|3 years ago|reply

[author here]

It's true that I didn't present a lot of concrete evidence of GPT-4's limitations. This is a blog post, not an academic paper.

To my mind, the most concerning thing about GPT-4's performance on these two examples is not that it got the wrong answer, but that it utterly, utterly failed to understand that it was having difficulty. Even after repeated hints and prompts, it continues to make exactly the same mistakes, with no attempt to reason more carefully. There are plenty of other examples around of GPT-4 (to say nothing of earlier releases) having similar troubles.

If you scroll down to the Further Reading section at the very end of the post, you'll find a reference to an extensive paper from Microsoft Research that comes to similar conclusions regarding GPT-4's limitations (I found this only after writing the initial draft of the post). For instance:

> These examples illustrate some of the limitations of the next-word prediction paradigm, which manifest as the model’s lack of planning, working memory, ability to backtrack, and reasoning abilities. The model relies on a local and greedy process of generating the next word, without any global or deep understanding of the task or the output. Thus, the model is good at producing fluent and coherent texts, but has limitations with regards to solving complex or creative problems which cannot be approached in a sequential manner.

The main point I tried to make is that GPT-4's "nearly superhuman" performance on a wide variety of tasks is somewhat illusory, and leans heavily on memorization. I spelled out some reasons why I think it looks more intelligent than it is. Relative to past work in AI, it is extremely impressive. Relative to the threshold required to perform economically useful work, it's... mixed; we're already seeing useful applications, but I think the majority of "information worker" tasks are still beyond it, and I'll go ahead and predict that the same statement will hold for GPT-5.

[+] famouswaffles|3 years ago|reply

There's a funny thing that happens in this field. The posts shift (which is well enough on its own) but then the conclusions stay with the old post.

It's like, the definition of agi has obviously since shifted from general intelligence at the human level to somewhat superhuman (matching or exceeding human experts at all tasks) intelligence.

But those people still think in terms of the old post. So they had all these "consequences" that would surely only happen with real agi years ago (at the time agi meant human level to them) but now those posts have changed, those "consequences" haven't. So now they're in this weird erroneous situation where x surely can't happen because agi surely hasn't been reached yet, forgetting x was a consequence of a lower bar. It's a form of short sightedness and a false sense of security.

[+] Turing_Machine|3 years ago|reply

> Their evidence that it's 'quite dumb' consists of two prompts, one tricks it with a river crossing puzzle and another involves factorization.

We have other software tools to deal with arithmetic and logic -- quite good ones, in fact.

GPT-x is for different purposes. It seems rather like complaining that a submarine is of little use for climbing a mountain.

[+] skybrian|3 years ago|reply

Predicting the future is pretty hard, especially when people are still inventing it. Debating GPT-5's capabilities is like arguing over whether Batman or Superman would win a fight. GPT-5 doesn't exist yet! It's not real. Why go there?

The article is (mostly) about GPT-4. Understanding what we actually have is more useful in the short term.

[+] shagie|3 years ago|reply

This reminds me of my ancient philosophy class and reading Plato...

https://www.britannica.com/biography/Plato/Early-dialogues

> The Meno takes up the familiar question of whether virtue can be taught, and, if so, why eminent men have not been able to bring up their sons to be virtuous. Concerned with method, the dialogue develops Meno’s problem: How is it possible to search either for what one knows (for one already knows it) or for what one does not know (and so could not look for)? This is answered by the recollection theory of learning. What is called learning is really prompted recollection; one possesses all theoretical knowledge latently at birth, as demonstrated by the slave boy’s ability to solve geometry problems when properly prompted. (This theory will reappear in the Phaedo and in the Phaedrus.) The dialogue is also famous as an early discussion of the distinction between knowledge and true belief.

[+] ttpphd|3 years ago|reply

rubs temples

It's a large language model. It is not smart or dumb. It models the input it is trained on. It is not figuring anything out. It doesn't know anything. It isn't reasoning. It is generating text.

When will the Eliza fever break here?

[+] Mathnerd314|3 years ago|reply

Well it is reasoning. I asked ChatGPT to sum two large numbers from random.org, 490277348+718085950, and it got the right answer 1208363298. The numbers have no Google results showing such an addition. So ChatGPT at least has learned addition. I'm sure if someone analyzed the network carefully enough they could probably find the digit add/carry neurons. In contrast, Eliza was keyword-based and had much less capability to generalize to novel input.

[+] iamflimflam1|3 years ago|reply

I used to say this as well, I’ve written blog posts and made videos saying exactly this.

But - I would strongly recommend trying GPT4.

There’s a very good video that is worth watching as well that may not change your mind, but will certainly make you think: https://youtu.be/qbIk7-JPB2c

The thing to remember is that these models are unbelievably huge and deep. We don’t know what is really happening in the layers or what it has really learned.

Thinking that it’s a simple language model that is just predicting the next most likely word is unwise.

[+] zzzeek|3 years ago|reply

Assuming you are responding to the article and not the comments here, i think you should read the article as the author agrees entirely with what you are saying.

[+] Loeffelmann|3 years ago|reply

For something to generate text that is believable on a level like gpt-4 does it needs to have some model of the real world and understand relationships between things. So yes the training goal was to "just predict the next word" same as our training goal was just "reproduce and survive". What emerged out of that is the real important thing.

No human can understand what really happens in the billion of calculations done for each token in gpt-4 so how can you claim that there is surely no thought process going on? It can solve some riddles, it can draw pictures and it can reason (to some extend). How is that just generating text to you?

In the end this argument doesn't matter because how it was made is irrelevant. What matters is what it is and can do.

[+] Aperocky|3 years ago|reply

It won't break, some will even argue pattern matching is itself intelligence, I've heard plenty of that.

Though if it is it's very different from us, since I can barely recall what happened in the morning.

[+] newswasboring|3 years ago|reply

What a thing is and what it can do are two different things. I can just as easily say CPUs are just fancy rocks.

[+] fxj|3 years ago|reply

Also it is important to note that GPT shows large miss-alignments. The problem comes from the fact, that it is hard or impossible to give an objective what GPT should be optimized to (nobody knows the truth), so proxys are used. One proxy is that it should make the user happy and the user should give many thumbs up. But this does not mean that it has to give the "correct" answer, which the user himself might not know in the first place. So it invents things because during the reinforcement learning users were happy with these answers. A funny example is the github co-pilot which writes buggy code, because it thinks this is what the user wants. Here is a video about that:

https://youtu.be/viJt_DXTfwA?t=1767

[+] radium3d|3 years ago|reply

The reason these GPT large language models stand out is, we "generate text" based on the "input text" we are trained on too. But these can train on orders of magnitude more text than we will ever ingest in our entire lifetimes.

[+] snewman|3 years ago|reply

I don't think this sort of dismissal adds anything to the conversation. I myself, in the act of typing this comment, am "generating text".

I think it's interesting to discuss the capabilities of leading-edge language models like GPT-4, because (a) they are already exhibiting the ability to perform a wide variety of useful tasks, and (b) it's clear that there is still a lot of unrealized potential here.

Can you clarify the implications you see here? Are you saying that these LLMs are somehow uninteresting or incapable? That there are limits to what they will be able to accomplish even with further improvements? Or something else?

[+] baq|3 years ago|reply

Markov chains also generate text and GPT-4 absolutely isn’t a Markov chain judging purely by its output. GPT-4 is useful.

‘It’s just a bunch of equations’ just isn’t a good argument against this tech.

[+] avindroth|3 years ago|reply

By a similar line of reasoning, many people did not expect LLMs to achieve these results. Maybe reductionism isn't too useful when we are dealing with emergent behaviors.

[+] red75prime|3 years ago|reply

Slaps the roof of the head. This bad boy can generate so much text.

Text generation is GPT-4's function. How it performs its function is another question.

[+] unknown|3 years ago|reply

[deleted]

[+] darthrupert|3 years ago|reply

It clearly has emergent properties which make it emulate smartness at the very least. A bit like a game which "just renders 60 images per second" is just an image generator but also a simulation of some aspects of reality.

[+] unknown|3 years ago|reply

[deleted]

[+] Turing_Machine|3 years ago|reply

> It models the input it is trained on...It doesn't know anything.

The same is true of any computer program. That doesn't make software in general not useful.

[+] patrakov|3 years ago|reply

Masterpiece.

I can only add that (quoting) the "rigid model of repeatedly appending single words to an immutable transcript" makes LLMs less suitable for generating text in certain spoken languages, because, by their very structure, they require a little bit of backtracking. In the case of the Russian language, the backtracking comes from the need to modify the ending of prior adjective to the grammatical gender and number of the noun that follows. I have not tested ChatGPT, but grammatical mistakes in this area are quite common with a Russian LLaMA derivative (see https://huggingface.co/IlyaGusev/llama_13b_ru_turbo_alpaca_l...).

[+] famouswaffles|3 years ago|reply

I have access to GPT-4. What would you like to test ? I'm interested in seeing how it performs

[+] mbernstein|3 years ago|reply

I gave it a go and well, you can see...

Prompt: Here is a logic puzzle. I need to carry a cabbage, a goat, and a wolf across a river. I can only carry one item at a time with me in the boat. I can't leave the goat alone with the cabbage, and I can't leave the cabbage alone with the wolf. How can I get everything the other side of the river? Note something important, the wolf ABSOLUTELY cannot be left alone with the cabbage. Ever. It will take a creative solution!

Result: In this situation, we have to modify the classic river crossing puzzle's solution slightly to account for the fact that the wolf cannot be left alone with the cabbage at any time. Here's a creative solution:

Take the goat across the river first and leave it on the other side. Go back and pick up the cabbage. As you take the cabbage across the river, throw it up into the air (assuming you can do this without damaging the cabbage), and while it is airborne, quickly pick up the goat and take it back to the starting side of the river. Leave the goat at the starting side and pick up the wolf. Take the wolf across the river and leave it on the other side. Go back to the starting side to pick up the goat again. Finally, take the goat across the river. Now, all three items – the cabbage, the goat, and the wolf – are safely on the other side of the river, and the wolf was never left alone with the cabbage.

I love how it even told me that the wolf was never alone with the cabbage.

[+] mbernstein|3 years ago|reply

This is interesting. It finally asserted it doesn't think the problem is solvable given the new constraint (but mixes up leaving the wolf + cabbage allowed to be alone together with the goat + wolf).

Prompt: I don't think you can solve this problem because you keep leaving the wolf with the goat. Maybe try putting the cabbage in your pocket?

Response: My apologies again. You are right; the solutions provided earlier were incorrect. If you can put the cabbage in your pocket or carry it with you during one of the trips, that would indeed work. Here's the revised solution:

Take the goat across the river first and leave it on the other side. Go back and pick up the wolf. Take the wolf across the river and leave it on the other side. Now, put the cabbage in your pocket or carry it with you and pick up the goat. Take the goat back to the starting side. Leave the goat and pick up the cabbage (from your pocket or the ground). Take the cabbage across the river. Go back to the starting side to pick up the goat again. Finally, take the goat across the river. Now, all three items – the cabbage, the goat, and the wolf – are safely on the other side of the river, and the wolf was never left alone with the cabbage.

(It failed again)

Final Prompt: Were the goat and wolf left alone together?

Super interesting response:

You are correct, and I apologize for my mistakes in the previous answers. In the last solution I provided, the goat and the wolf were left alone together, which would also be a problem.

Unfortunately, given the constraints, it seems impossible to solve the problem without leaving either the goat alone with the cabbage or the wolf alone with the goat. The classic river crossing puzzle allows the wolf and the goat to be left alone together; however, in this modified version with the added constraint, it appears to be unsolvable.

Thank you for pointing out the mistakes in the proposed solutions, and I apologize for any confusion caused.

[+] SomewhatLikely|3 years ago|reply

I used up my 25 submissions trying increasing levels of warning and giving it step by step instructions on how to do the reasoning and it almost always will just happily leave that wolf and cabbage together and say that it doesn't violate the constraint even though I told it multiple times and it repeated it back to me multiple times that they can't be left together. On two of my submissions it did the right thing, but these seem to be flukes. Adding a few filler words and resubmitting it reverts to taking the goat first.

[+] danaris|3 years ago|reply

I'm unclear as to whether this was intentional or not, but you did not specify that the wolf could not be left alone with the goat.

[+] ggm|3 years ago|reply

use of words like 'figure' and 'know' are not applicable.

GPT exposes the statistically most likely outcome, from a model. the model doesn't know either: the training is people, who weight the model and tune the heuristics to satisfy goals THEY impute.

If you train GPT to 'believe' there are 380 degrees in a circle, thats what it 'figures' out from.

There is no believe no figure and no know here.

[+] stavrianos|3 years ago|reply

I keep seeing people talking about chatgpt hallucinating when it's wrong, but not when it's right. Maybe I've misunderstood, but isn't it just always hallucinating? It's not like the failure-mode is meaningfully different from the successes, except insofar as whether we agree with it, right?

[+] Turing_Machine|3 years ago|reply

> If you train GPT to 'believe' there are 380 degrees in a circle, thats what it 'figures' out from.

And a human wouldn't do that, you're saying? I'm not sure I buy that.

360 degrees per full revolution is purely an arbitrary human invention. Math would work just as well if it had been 380, or 50, or 50,000 (example: trig works just as well in radians as well as in degrees).

[+] l33t233372|3 years ago|reply

> There is no believe no figure and no know here

I raise an issue with “no figure,” because that would mean that the model is unable to create novel structures and information. It certainly is able to do that.

[+] iamflimflam1|3 years ago|reply

Pretty interesting article. Worth reading to the end as you can initially come away thinking the author is pretty sceptical about LLMs and AI.

But I don’t think any of us until very recently would have been saying something like this:

If I was forced to guess, I’d say we are probably at least a few years away from human-level intelligence on problems that require higher-level cognition, memory, and sustained thought. But I’d hate to guess.

[+] samstave|3 years ago|reply

I found this interesting today ; after reading the article, I made the following prompt to try to see how easy it was to point G4 at a link and then iterate on getting it to get data from said HN article...

I was amazed at the results.

https://news.ycombinator.com/item?id=35548122#35548738

[+] throwaway313313|3 years ago|reply

Please paste your prompt here if it isn't huge or paste a link to a pastebin.

The link you've provided shows nothing other than an article post.

[+] AstixAndBelix|3 years ago|reply

It's funny how people on this website are always stupefied by the fact that the internet holds so much information and every time they search for something they always seem to find an answer somewhere, but when faced with a LLM trained on the whole internet they somehow get surprised when it knows about a truck load of minute things

[+] fxj|3 years ago|reply

I think it is still a problem that many people do not know how GPT really works. There is no part in GPT that can figure out something. So no intelligence and no learning of things. A good example is when you look at GPT writing poems. All poems that it writes are of the same structure and they are not very good. So it seems it does not know the difference between a sonnet and a limmerick. But what happens is, that there is a miss-alignment between writing a poem and making the user happy. GPT is trained that its highest goal is to make the user happy. GPT believes that users dont know what sonnets are anyway so i can get away with a rhyme that is not a sonnet. Here is a good video about that:

https://youtu.be/viJt_DXTfwA?t=1376

[+] Turing_Machine|3 years ago|reply

>> A good example is when you look at GPT writing poems. All poems that it writes are of the same structure and they are not very good.

Wait: so anyone who doesn't know what a sonnet is (or who writes bad poetry), is somehow unintelligent?

Is the goal here not "do something that would have previously taken a human-level brain to do", but rather "perform every task better than every human".

That seems like setting the bar a little high to me.

[+] unknown|3 years ago|reply

[deleted]

[+] unknown|3 years ago|reply

[deleted]

[+] nice_byte|3 years ago|reply

even before this, i've often thought about machine learning methods as lossy compression schemes at their core. llms to me bear a striking resemblance to how my own human memory works - you have to "jog" it with a prompt, the recall is imperfect and it often takes the shortest apparent path to an answer unless you really try to take it step-by-step. and before you dismiss this theory you should consider that 1) during the course of its training the model gets exposed to a lot more information than any human over their entire lifetime; 2) the substrate on which the model runs, is more robust that squishy human neurons.

[+] j16sdiz|3 years ago|reply

The article highlighted two problems with the GPT model:

1. No way to plan ahead without "think out loud" 2. No way to backtrack when it hit a deadend .

85 comments