WingNews

sota_pop|9 months ago

I disagree with this wholeheartedly. Sure, there is lots of trial and error, but it’s more an amalgamation of theory from many areas of mathematics including but not limited to: topology, geometry, game theory, calculus, and statistics. The very foundations (i.e. back-propagation) is just the chain rule applied to the weights. The difference is that deep learning has become such an accessible (sic profitable) field that many practitioners have the luxury of learning the subject without having to learn the origins of the formalisms. Ultimately allowing them to utilize or “reinvent” theories and techniques often without knowing they have been around in other fields for much longer.

saberience|9 months ago

None of the major aspects of deep learning came from manifolds though.

It is primarily linear algebra, calculus, probability theory and statistics, secondarily you could add something like information theory for ideas like entropy, loss functions etc.

But really, if "manifolds" had never been invented/conceptualized, we would still have deep learning now, it really made zero impact on the actual practical technology we are all using every day now.

kwertzzz|9 months ago

Can you give an example where theories and techniques from other fields are reinvented? I would be genuinely interested for concrete examples. Such "reinventions" happen quite often in science, so to some degree this would be expected.

behnamoh|9 months ago

> a few intuitions coming from theory (that was not topology).

I think these 'intuitions' are an after-the-fact thing, meaning AFTER deep learning comes up with a method, researchers in other fields of science notice the similarities between the deep learning approach and their (possibly decades old) methods. Here's an example where the author discovers that GPT is really the same computational problems he has solved in physics before:

https://ondrejcertik.com/blog/2023/03/fastgpt-faster-than-py...

ogogmad|9 months ago

I beg to differ. It's complete hyperbole to suggest that the article said "it's the same problem as something in physics", given this statement:

     It seems that the bottleneck algorithm in GPT-2 inference is matrix-matrix multiplication. For physicists like us, matrix-matrix multiplication is very familiar, *unlike other aspects of AI and ML* [emphasis mine]. Finding this familiar ground inspired us to approach GPT-2 like any other numerical computing problem.

Note: Matrix-matrix multiplication is basic mathematics, and not remotely interesting as physics. It's not physically interesting.

constantcrying|9 months ago

You are exactly right, after deep learning researchers had invented Adam for SGD, numerical analysts finally discovered Gradient descent. And after the first neural net was discovered, finally the matrix was invented in the novel field of linear algebra.

unknown|9 months ago

[deleted]

theahura|9 months ago

I say this as someone who has been in deep learning for over a decade now: this is pretty wrong, both on the merits (data obviously lives on a manifold) and on its applications to deep learning (cf chris olah's blog as an example from 2014, which is linked in my post -- https://colah.github.io/posts/2014-03-NN-Manifolds-Topology/). Embedding spaces are called 'spaces' for a reason. GANs, VAEs, contrastive losses -- all of these are about constructing vector manifolds that you can 'walk' to produce different kinds of data.

umutisik|9 months ago

If data did live on a manifold contained, e.g. images in R^{n^2}, then it wouldn't have thickness or branching, which it does. It's an imperfect approximation to help think about it. The use of mathematical language is not the same as an application of mathematics (and the use of the word 'space' there is not about topology).

almostgotcaught|9 months ago

You're citing a guy that never went to college (has no math or physics degree), has never published a paper, etc. I guess that actually tracks pretty well with how strong the whole "it's deep theory" claim is.

niemandhier|9 months ago

It’s alchemy.

Deep learning in its current form relates to a hypothetical underlying theory as alchemy does to chemistry.

In a few hundred years the Inuktitut speaking high schoolers of the civilisation that comes after us will learn that this strange word “deep learning” is a left over from the lingua franca of yore.

adamnemecek|9 months ago

Not really, most of the current approaches are some approximations of the partition function.

esafak|9 months ago

It does if you relax your definition to accommodate approximation error, cf. e.g., Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning (https://aclanthology.org/2021.acl-long.568.pdf)

Koshkin|9 months ago

> Data doesn't actually live on a manifold.

Often, they do (and then they are called "sheaves").

wenc|9 months ago

Many types of data don’t. Disconnected spaces like integer spaces don’t sit on a manifold (they are lattices). Spiky noisy fragmented data don’t sit on a (smooth) manifold.

In fact not all ML models treat data as manifolds. Nearest neighbors, decision trees don’t require the manifold assumption and actually work better without it.

motoboi|9 months ago

Your comment sits in the nice gradient between not seeing at all the obvious relationships between deep learning and topology and thinking that deep learning is applied topology.

See? Everything lives in the manifold.

Now for a great visualization about the Manifold Hypothesis I cannot recommend more this video: https://www.youtube.com/watch?v=pdNYw6qwuNc

That helps to visualize how the activation functions, bias and weights (linear transformations) serve to stretch the high dimensional space so that data go into extremes and become easy to put in a high dimension, low dimensional object (the manifold) where is trivial to classify or separate.

Gaining an intuition about this process will make some deep learning practices so much easy to understand.

thuuuomas|9 months ago

I cannot understand this prideful resentment of theory common among self-described practitioners.

Even if existing theory is inadequate, would an operating theory not be beneficial?

Or is the mystique combined with guess&check drudgery job security?

canjobear|9 months ago

If there were theory that led to directly useful results (like, telling you the right hyperparameters to use for your data in a simple way, or giving you a new kind of regularization that you can drop in to dramatically improve learning) then deep learning practitioners would love it. As it currently stands, such theories don't really exist.

jebarker|9 months ago

There are strong incentives to leave theory as technical debt and keep charging forward. I don't think it's resentment of theory, everyone would love a theory if one were available but very few are willing to forgoe the near term rewards to pursue theory. Also it's really hard.

lumost|9 months ago

There are many reasons to believe a theory may not be forthcoming, or that if it is available may not be useful.

For instance, we do not have consensus on what a theory should accomplish - should it provide convergence bounds/capability bounds? Should it predict optimal parameter counts/shapes? Should it allow more efficient calculation of optimal weights? Does it need to do these tasks in linear time?

Even materials science in metals is still cycling through theoretical models after thousands of years of making steel and other alloys.

hiddencost|9 months ago

Maybe a little less with the ad hominems? The OP is providing an accurate description of an extremely immature field.

danielmarkbruce|9 months ago

Who is proud? What you are seeing in some cases is eye rolling. And it's fair eye rolling.

There is an enormous amount of theory used in the various parts of building models, there just isn't an overarching theory at the very most convenient level of abstraction.

It almost has to be this way. If there was some neat theory, people would use it and build even more complex things on top of it in an experimental way and then so on.

baxtr|9 months ago

Just a side comment to your observation: the principle is called reductionism and has been tried on many fields.

Physics is just applied mathematics

Chemistry is just applied physics

Biology is just applied chemistry

It doesn’t work very well.

constantcrying|9 months ago

>it's an empirical field advanced mostly by trial and error and, sure, a few intuitions coming from theory (that was not topology).

Neural Networks consist almost exclusively of two parts, numerical linear algebra and numerical optimization.

Even if you reject the abstract topological description. Numerical linear algebra and optimization couldn't be any more directly applicable.

yubblegum|9 months ago

> Near total majority, if not 100%, of the useful things done in deep learning have come from not thinking about topology in any way.

Of course. Now, to actually deeply understand what is happening with these constructs, we will use topology. Topoligical insights will without doubt then inform the next generations of this technology.

solomatov|9 months ago

May I ask you to give examples of insights from topology which improved existing models, or at least improved our understanding of them? arxiv papers are preferred.

Regic|9 months ago

I feel like the fact that ML has no good explanation why it works this well gives a lot of people room to invent their head-canon, usually from their field of expertise. I've seen this from exceptionally intelligent individuals too. If you only have a hammer...

nomel|9 months ago

I think it would be more unusual, and concerning, if an intelligent individual didn't attempt to apply their expertise for a head-canon of something unknown.

Coming up with an idea for how something works, by applying your expertise, is the fundamental foundation of intelligence, learning, and was behind every single advancement of human understanding.

People thinking is always a good thing. Thinking about the unknown is better. Thinking with others is best, and sharing those thoughts isn't somehow bad, even if they're not complete.

HarHarVeryFunny|9 months ago

When you say ML, I assume you really mean LLMs?

Even with LLMs, there's no real mystery about why they work so well - they produce human-like input continuations (aka "answers") because they are trained to predict continuations of human-generated training data. Maybe we should be a bit surprised that the continuation signal is there in the first place, but given that it evidentially is, it's no mystery that LLMs are able to use it - just testimony to the power of the Transformer as a predictive architecture, and of course to gradient descent as a cold unthinking way of finding an error minimum.

Perhaps you meant how LLMs work, rather than why they work, but I'm not sure there's any real mystery there either - the transformer itself is all about key-based attention, and we now know that training a transformer seems to consistently cause it to leverage attention to learn "induction heads" (using pairs of adjacent attention heads) that are the main data finding/copying primitive they use to operate.

Of course knowing how an LLM works in broad strokes isn't the same as knowing specifically how it is working in any given case, how is it transforming a specific input layer by layer to create the given output, but that seems a bit like saying that because I can't describe - precisely - why you had pancakes for breakfast, that we don't know how the brains works.

csimon80|9 months ago

"All models are wrong, but some are useful" -George Box

woopwoop|9 months ago

I don't agree with your first sentence, but I agree with the rest of this post.

unknown|9 months ago

[deleted]

(no title)

discuss