Understanding Deep Learning

[+] martingoodson|2 years ago|reply

Most comments here are in one of two camps: 1) you don't need to know any of this stuff, you can make AI systems without this knowledge, or 2) you need this foundational knowledge to really understand what's going on.

Both perspectives are correct. The field is bifurcating into two different skill sets: ML engineer and ML scientist (or researcher).

It's great to have both types on a team. The scientists will be too slow; the engineers will bound ahead trying out various APIs and open-source models. But when they hit a roadblock or need to adapt an algorithm many engineers will stumble. They need an R&D mindset that is quite alien to many of them.

This is when an AI scientists become essential.

[+] braza|2 years ago|reply

> But when they hit a roadblock or need to adapt an algorithm many engineers will stumble.

My experience is the other way around.

People underestimate how powerful building systems is and how most of the problems worth solving are boring and require out-of-the-box techniques.

During the last decade, I was in some teams and I noticed the same pattern: The company has some extra budget and "believes" that their problem is exceptional.

Then goes and hires some PhDs Data scientists with some publications but only know R and are fresh from some Python bootcamps.

After 3 months, or this new team no much was done, tons of Jupyter notebooks around but no code in production, and some of them did not even have an environment to do experimentation.

The business problem is still not solved. The company realizes that having a lot of Data Scientists not not so many Data/ML Enginers means that they are (a) blocked to do pushing something to production or (b) are creating a death star of data pipelines + algorithms + infra (spending 70% more of resources due to lack of knowledge on Python).

The project gets delayed. Some people become impatient.

Now you have a solid USD 2.5 million/year team that is not capable of delivering a proof of concept due to the fact that people cannot do the serving via Batch or via REST API.

The company lost momentum, competitors moved fast. They released an imperfect solution, but a solution ahead, and they have users on it and they are enhancing.

Frustration kicks in, and PMs and Eng Managers fight about accountability. VP of Product and Engineering wants heads in a silver plate.

Some PhDs get fired and go to be teachers in some local university.

Fin.

[+] gardenhedge|2 years ago|reply

Would you see these as analogous?

The people who create the models and the people that use them.

The people who create the programming languages and the people that use them.

[+] 3abiton|2 years ago|reply

This sounds like a sell-pitch for an AI scientist.

[+] mi_lk|2 years ago|reply

I guess this message is delivered by an AI scientist, sure.

It's almost self-exploratory that when you hit a roadblock in practice you go back to foundations, and good people should aim to do both. In that case I don't see where ML engineer/scientist bifurcation comes from except for some to feel good about themselves

[+] nsxwolf|2 years ago|reply

As someone who missed the boat on this, is learning about this just for historical purposes now, or is there still relevance to future employment? I just imagine the OpenAI eats everyone's lunch in regards to anything AI related, am I way off base?

[+] GeneralMayhem|2 years ago|reply

The most important thing to learn for most practical purposes is what the thing can actually do. There's a lot of fuzzy thinking around ML - "throw AI at it and it'll magically get better!" Sources like Karpathy's recent video on what LLMs actually do are good anti-hype for the lay audience, but getting good practical working knowledge that's a level deeper is tough without working through it. You don't have to memorize all the math, but it's good to get a feel for the "interface" of the components. What is it that each model technique actually does - especially at inference time, where it needs to be well-integrated with the rest of the stack?

In terms of continued relevance - "deep learning", meaning, dense neural nets trained to optimize a particular function, haven't fundamentally changed in practice in ~15 years (and much longer than that in theory), and are still way more important and broadly used than the OpenAI stuff for most purposes. Anything that involves numerical estimation (e.g., ad optimization, financial modeling) is not going to use LLMs, it's going to use a purpose-built model as part of a larger system. The interface of "put numbers in, get number[s] out" is more explainable, easier to integrate with the rest of your software stack, and more measurable. It has error bars that are understandable and occasionally even consistent. It has a controllable interface that won't suddenly decide to blurt corporate secrets or forget how to serialize JSON. And it has much, much lower latency and cost - any time you're trying to render a web page in under 100ms or run an optimization over millions of options, generative AI just isn't a practical option (and is unlikely to become one, IMO).

I don't have a significant math or theoretical ML background, but I've spent most of the last 10 years working side by side with ML experts on infra, data pipelines, and monitoring. I'm not sure I could integrate the sigmoid off the top of my head, but that's not what's important - I've done it once, enough to have some idea how the function behaves, and I know how to reason about it as a black box component.

[+] deepsquirrelnet|2 years ago|reply

This is about deep learning, of which LLMs are a subset. If you are interested in machine learning, then you should learn deep learning. It is incredibly useful for a lot of reasons.

Unlike other areas of ML, the nature of deep learning is such that its parts are interoperable. You could use a transformer with a CNN if you wish. Also, deep learning enables you to do machine learning on any type of data, text, images, video, audio. Finally, it can naturally scale computationally.

As someone pretty involved in the field, I lament that LLMs are turning people away from ML and deep learning, and following the misconceptions that there’s no reason to do it anymore. Large algorithms are expensive to run, have slow throughput and are still generally poorer performing than purpose built models. They’re not even that easy to use for a lot of tasks, in comparison to encoder networks.

I’m biased, but I think it’s one of the most fun things to learn in computing. And if you have a good idea, you can still build state of the art things with a regular gpu at your house. You just have to find a niche that isn’t getting the attention that LLMs are ;)

[+] hedgehog|2 years ago|reply

Highly relevant if you want to work on ML systems. Despite how much OpenAI dominates the press there are actually many, many teams building useful and interesting things.

[+] ww520|2 years ago|reply

From an application perspective, it's more important to understand how the overall ML process work, the key concepts, and how things are fitted together. Deep Learning is a part of that. Lots of these are already wrapped in libraries and API, so it's a matter of preparing the correct data, calling the right API's, and utilizing the result.

[+] niemandhier|2 years ago|reply

Someone will dominate the AI as a service marked, but there are so many applications for tiny edge ai that no single player can dominate all of them.

OpenAI is for example not interested in developing small embedded neural networks that run on a sensor chip that real-time detects specific molecules in air.

[+] two_in_one|2 years ago|reply

It's like calculus, nothing new in the last years, is it still important? The answer is still "Yes".

After a glance, looks like too much for one book. Probably it was compressed with the assumption that reader already knows quite a lot. In other words it's not an easy reading.

[+] ksherlock|2 years ago|reply

Maybe last week's drama should have been a left-pad moment. For many things you can train your own NN and be just as good without being dependent on internet access, third parties, etc. Knowing how things work should give you insight into using them better.

[+] quickthrower2|2 years ago|reply

This would be like learning how your CPU / Memory works, even though JS is eating everyone's (web front-end) lunch.

So yes if you are prompt engineering, and wondering why X works and sometimes it doesn't, and why any of this works at all, it is good to study a bit.

[+] Slix|2 years ago|reply

I came here with the same question. After reading and learning these materials, will I have new job skills or AI knowledge that I can do something with?

[+] lngnmn2|2 years ago|reply

[deleted]

[+] msie|2 years ago|reply

This book looks impressive. There's a chapter on the unreasonable effectiveness of Deep Learning which I love. Any other books I should be on the lookout for?

[+] nextos|2 years ago|reply

This presentation from Deep Mind outlines some foundational ML books: https://drive.google.com/file/d/1lPePNMGMEKoaDvxiftc8hcy-rFp...

For the impatient, look into slide #123. Essentially, the recommendations are Murphy, Gelman, Barber, and Deisenroth.

Note these slides have a Bayesian bias. In spite of that, Murphy is a great DL book. Besides, going through GLMs is a great way to get into DL.

[+] teleforce|2 years ago|reply

Yes, it looks very impressive indeed and it has the potential to be the seminal textbook on the subject.

Fun facts, the infamous Attention paper is closing in to reach the 10K citations, and it should reach this milestone by the end of this year. It's probably the fastest paper ever to reach this significant milestone. Any deep learning book written before the Attention paper should be considered out of date, and needs updating. The situation is not unlike an outdated Physics textbook with Newton's laws but devoid of the infamous Einstein's equation of energy equivalence.

[+] nootopian|2 years ago|reply

https://news.ycombinator.com/item?id=38425368

[+] Slix|2 years ago|reply

If I start now and start reading up on AI, will I become anything close to an expert?

I'm worried that I'm starting a journey that requires a Master's or PhD.

[+] Chirono|2 years ago|reply

From reading this book you’d have a very good grasp of the underlying theory, much more than many ML engineers. But you’d be missing out on the practical lessons, all the little tips and intuitions you need to be able to get systems working in practice. I think this just takes time and it’s as much an art as it is a science.

[+] strikelaserclaw|2 years ago|reply

The only guide post to use in this world of ever increasing information to learn is to ask yourself "do i find learning this stuff enjoyable?", questions like "can i become an expert" are vague and not good guideposts.

[+] ocharles|2 years ago|reply

Very hard to answer without knowing what your goal is. Do you want to be a practitioner of DL, or do you want to be a researcher?

[+] crimsoneer|2 years ago|reply

You probably won't become an expert, but I'm not clear why you'd want to!

[+] ldjkfkdsjnv|2 years ago|reply

I spent a decade working on various machine learning platforms at well known tech companies. Everything I ever worked on became obsolete pretty fast. From the ML algorithm to the compute platform, all of it was very transitory. That coupled with the fact that a few elite companies are responsible for all ML innovation, its oxymoronic to me to even learn a lot of this material.

[+] nabla9|2 years ago|reply

>machine learning platforms

Machine learning platforms become obsolete.

Machine learning algorithms and ideas don't. If learning SVN or Naive Bayes did not teach you things that are useful today, you didn't learn anything.

[+] HighFreqAsuka|2 years ago|reply

Quite a lot of techniques in deep learning have stood the test of time at this point. Also new techniques are developed either depending on or trying to solved deficiencies in old techniques. For example Transformers were developed to solve vanishing gradients in LSTMs over long sequences and improve GPU utilization since LSTMs were inherently sequential in the time dimension.

[+] tysam_and|2 years ago|reply

Highly, highly disagree.

If it became obsolete, then y'all were doing the new shiny.

The fundamentals don't really change. There are several different streams in the field, and there are many, many algorithms with good staying power in use. Of course, you can upgrade some if you like, but chase the white rabbit forever, and all you'll get is a handful of fluff.

[+] reqo|2 years ago|reply

Very few things stay the same in Technology. You should think of technology as another type of evolution! It is driven by the same type of forces as evolution IMO. I think even Linus Torvalds once stated that Linux evolved trough natural selection.

[+] wiz21c|2 years ago|reply

So, what fundamental stuff should I learn ? I understand ML has some general principles that keeps on bein valid throughout the years. No ?

[+] drBonkers|2 years ago|reply

What would you recommend someone read instead?

[+] contrarian1234|2 years ago|reply

It's very hard to judge a book like this... (based on a table of contents?)

Who is the author ?

Have they published anything else highly rated ?

Are there good reviews from people that know what they're talking about?

Are there good reviews from students that don't know anything ?

[+] komatsu|2 years ago|reply

I can highly recommend the author. His last book "Computer Vision: Models, Learning, and Inference" is very readable, approaches the matter from unorthodox viewpoints + includes lot of excellent figures supporting the text. I'm buying this on paper!

[+] siddbudd|2 years ago|reply

Some Google-fu for you:

based on a table of contents? You can download the draft of Chapters 1-21 (500+ pages) from the linked site.

Who is the author ? Simon J. D. Prince is Honorary Professor of Computer Science at the University of Bath and author of Computer Vision: Models, Learning and Inference. A research scientist specializing in artificial intelligence and deep learning, he has led teams of research scientists in academia and industry at Anthropics Technologies Ltd, Borealis AI, and elsewhere.

Have they published anything else highly rated ? Author of >50 peer reviewed publications in top tier conferences (CVPR, ICCV, SIGGRAPH etc.) https://scholar.google.com/citations?user=fjm67xYAAAAJ&hl=en

Are there good reviews [...] The book has not been published, this is literally a free draft that you are looking at. The book is listed on Amazon as a pre-order for 85USD.

[+] xcv123|2 years ago|reply

> based on a table of contents?

The entire pdf is available as a free download on that page. First link at the top.

https://github.com/udlbook/udlbook/releases/download/v1.16/U...

[+] arman_hkh|2 years ago|reply

Marcus Hutter on his [Marcus' AI Recommendation Page]: "Prince (2023) is the (only excellent) textbook on deep learning."

[+] dchuk|2 years ago|reply

Hopefully not a dumb question: how do I buy a physical copy?

[+] rossant|2 years ago|reply

It'll be published in a few days: https://mitpress.mit.edu/9780262048644/understanding-deep-le...

[+] oakejp12|2 years ago|reply

The PDF figures for 'Why does deep learning work' seem to point to 'Deep learning and ethics' and vice versa.

[+] water-your-self|2 years ago|reply

No chapter on RNNs, but one on transformers is interesting, having last read Deep learning by ian goodfellow in 2016

[+] PeterisP|2 years ago|reply

RNNs have "lost the hardware lottery" by being structurally not that efficient to train on the cost-effective hardware that's available. So they're not really used for much right now - though IMHO they are conceptually sufficiently interesting enough to cover in such a course.

[+] ksvarma|2 years ago|reply

Simply great work and making it freely available is outstanding!!

[+] TrackerFF|2 years ago|reply

Reading through it, and it def looks accessible.

[+] Webster09|2 years ago|reply

[deleted]

[+] WeMoveOn|2 years ago|reply

lit

[+] adamnemecek|2 years ago|reply

All machine learning is Hopf convolution, analogous to renormalization. This should come as no surprise, renormalization can be modeled via the Ising model which itself is closely related to Hopfield networks which are recurrent networks.

[+] calf|2 years ago|reply

That's an interesting point, are there any resources to learn about this? I have a CS background, in that we generally only cover 1st-year physics and very little theoretical math beyond linear algebra, etc.

[+] dbmikus|2 years ago|reply

Don't know any of these terms, but you gave me some interesting topics to google about. Thanks!

98 comments