(no title)
ansk
|
1 year ago
I think that while researchers would agree that the empirical success of deep learning has been remarkable, they would still agree that the language used here -- "an algorithm that could really, truly learn any distribution of data (or really, the underlying “rules” that produce any distribution of data)" -- is an overly strong characterization, to the point that it is no longer accurate. A hash function is a good example of a generating process which NN + SGD will not learn with any degree of generalization. If you trained GPT4 on an infinite dataset of strings and their corresponding hashes, it would simply saturate its 100 billion+ parameters with something akin to a compressed lookup table of input/output pairs, despite the true generating process being a program that could be expressed in less than a kilobyte. On unseen data, it would be no better than a uniform prior over hashes. Anyways, my point is that people knowledgable in the field would have far more tempered takes on the practical limits of deep learning, and would reserve the absolute framing used here for claims that have been proven formally.
jmmcd|1 year ago
[*] Ok, we can differ on that. My feeling is partly because the types of distributions that can't be learned - eg hash functions - are generally the kind of functions we don't really want to learn. Underneath this are deeper questions related to no free lunch and how "nice"/"well-behaved" this universe is.