top | item 43056370

(no title)

itkovian_ | 1 year ago

"That is a phrase that I coined in a 2022 essay called “Deep Learning is Hitting a Wall,” which was about why scaling wouldn’t get us to AGI. And when I coined it, everybody dismissed me and said, “No, we’re not reaching diminishing returns. We have these scaling laws. We’ll just get more data.”

How can anyone think he's arguing in good faith at this point. That essay was published after gpt3 prior to gpt 4 - and he's claiming it was correct!

discuss

order

sytelus|1 year ago

I lost faith in Marcus after just a few interactions. He is indeed what one would refer to as "crackpot" in academia. The most glaring thing was that he is technically extremely shallow and don't have a clue about most of the details. I also got impression that is enormously enamored with having attention and recognition at any cost. Depending on weather, he will change directions and views, basically just do anything it took to get that attention no matter how ridiculous he looks doing that.

While writing this, it occured to me that he would get even goose bumps at reading this comment because it, after all, I am giving him attention.

baobabKoodaa|1 year ago

> Depending on weather, he will change directions and views

My impression is the opposite: I would describe Gary Marcus as having all his opinions perfectly aligned to a singular viewpoint at all times regardless of weather (or evidence).

petters|1 year ago

I don't think he is always arguing in good faith, unfortunately.

abc-1|1 year ago

So his timing was slightly off. I don’t know why people expected LLMs to improve exponentially. Your iPhone now doesn’t look much different than the one 10 years ago. GPT-3 or arguably GPT-4 was the first iPhone moment, everything else will be gradual improvements unless fundamental discoveries are found, but those happen seemingly randomly.

anonylizard|1 year ago

If one compares O3-mini's coding abilities to the original GPT-4. It is as large as GPT-3 to GPT-4 gap

GPT-3: Useful as autocomplete. Still error prone, but vastly better than any pre-AI autocomplete

GPT-4: Already capable of independently coding up simple functions based on natural language.

O3-mini: Can code in say top 5% of codeforces.

There's a 2 years gap between each of them.

More over, intelligence has a superexponential return, 90IQ->100IQ < 100IQ->110IQ in terms of returns.

rashidae|1 year ago

AI is spreading across disciplines like science, math, software development, language, music, and health. You’re looking at it too narrowly. Human-computer symbiosis is accelerating at an unprecedented rate, far beyond the pace of something like the iPhone.

mquander|1 year ago

In what sense are the bleeding edge models incremental improvements over GPT-3 (read his examples of GPT-3 output and imagine any of the top models today producing them!), GPT-3.5, or GPT-4? Look at any benchmark or use it yourself. It's night and day.

Gary Marcus didn't make a lot of specific criticisms or concrete predictions in his essay [0], but some of his criticisms of GPT-3 were:

- "For all its fluency, GPT-3 can neither integrate information from basic web searches nor reason about the most basic everyday phenomena."

- "Researchers at DeepMind and elsewhere have been trying desperately to patch the toxic language and misinformation problems, but have thus far come up dry."

- "Deep learning on its own continues to struggle even in domains as orderly as arithmetic."

Are these not all dramatically improved, no matter how you measure them, in the past three years?

[0] https://nautil.us/deep-learning-is-hitting-a-wall-238440/

pj_mukh|1 year ago

The difference between a doomsday conspiracy theorist and a physicist surmising the heat death of the universe is...just timing.

suddenlybananas|1 year ago

Well it's true that all of the most recent advances come from changes the architecture to do to inference scaling instead of model scaling. Scaling laws as people talked about in them in 2022 (that you take a base LLM and make it bigger) are dead.

menaerus|1 year ago

I think you want both. To scale the model, e.g. train it with more and more data, you also need to scale your inference step. Otherwise, it just takes too long and it's too costly, no?