top | item 41630031

(no title)

ansk | 1 year ago

> humanity discovered an algorithm that could really, truly learn any distribution of data (or really, the underlying “rules” that produce any distribution of data)

He's hand-waving around the idea presented in the Universal Approximation Theorem, but he's mangled it to the point of falsehood by conflating representation and learning. Just because we can parameterize an arbitrarily flexible class of distributions doesn't mean we have an algorithm to learn the optimal set of parameters. He digs an even deeper hole by claiming that this algorithm actually learns 'the underlying “rules” that produce any distribution of data', which is essentially a totally unfounded assertion that the functions learned by neural nets will generalize is some particular manner.

> I find that no matter how much time I spend thinking about this, I can never really internalize how consequential it is.

If you think the Universal Approximation Theorem is this profound, you haven't understood it. It's about as profound as the notion that you can approximate a polynomial by splicing together an infinite number of piecewise linear functions.

discuss

order

ComplexSystems|1 year ago

"Just because we can parameterize an arbitrarily flexible class of distributions doesn't mean we have an algorithm to learn the optimal set of parameters."

This is equally mangled, if not more, than what Altman is saying. We don't need to learn "the optimal" set of parameters. We need to learn "a good" set of parameters that approximates the original distribution "well enough." Gradient methods and large networks with lots of parameters seem to be capable of doing that without overfitting to the data set. That's a much stronger statement than the universal approximation theorem.

jmmcd|1 year ago

Yes, he's handwaving in this general area, but no, he's not really relying on the UAT. If you talked to most NN people 2 decades ago and asked about this, they might well answer in terms of the UAT. But nowadays, most people, including here Altman, would answer in terms of practical experience of success in learning a surprisingly diverse array of distributions using a single architecture.

ansk|1 year ago

I think that while researchers would agree that the empirical success of deep learning has been remarkable, they would still agree that the language used here -- "an algorithm that could really, truly learn any distribution of data (or really, the underlying “rules” that produce any distribution of data)" -- is an overly strong characterization, to the point that it is no longer accurate. A hash function is a good example of a generating process which NN + SGD will not learn with any degree of generalization. If you trained GPT4 on an infinite dataset of strings and their corresponding hashes, it would simply saturate its 100 billion+ parameters with something akin to a compressed lookup table of input/output pairs, despite the true generating process being a program that could be expressed in less than a kilobyte. On unseen data, it would be no better than a uniform prior over hashes. Anyways, my point is that people knowledgable in the field would have far more tempered takes on the practical limits of deep learning, and would reserve the absolute framing used here for claims that have been proven formally.

klyrs|1 year ago

> It's about as profound as the notion that you can approximate a polynomial by splicing together an infinite number of piecewise linear functions.

Wait 'til you hit complex analysis and discover that Universal Entire Functions don't just exist, they're basically polynomials.

stogot|1 year ago

So basically Altman hasn’t taken enough math courses yet thinks he is starring in Good Will Hunting

namaria|1 year ago

"We're using more energy to do more math and the results are increasingly better"

Reasonable.

"If we keep increasing the amount of energy and math, we should all be floating in bliss any day now"

Selling something.

bbor|1 year ago

You’re (both!) getting into metaphysics without necessarily realizing it. He’s just saying that a machine that could learn any pattern—not a sub pattern of accidental actualities that it overfits on, but the real virtual pattern driving some set of phenomena—would be a game changer. Sure, there are infinite things that can’t be reduced to polynomials, but something tells me that a whole lot of things that matter to us are, across the fields of Physics, Biology, Sociology and Neurology especially.

Basically it’ll be (and already has been since the quantum breakthroughs in the 1920s, to some extent) a revolution in scientific methods not unlike what Newton and Galileo brought us with physical mechanics: this time, sadly, the mechanics are beyond our direct comprehension, reachable only with statistical tricks and smart guessing machines.