Ask HN: What are the foundational texts for learning about AI/ML/NN?
285 points| mfrieswyk | 3 years ago
Pattern Recognition and Machine Learning - Bishop
Deep Learning - Goodfellow, Bengio, Courville
Neural Smithing - Reed, Marks
Neural Networks - Haykin
Artificial Intelligence - Haugeland
softwaredoug|3 years ago
(there's also "Elements of Statistical Learning" which is a more advanced version)
AI: A Modern Approach - https://aima.cs.berkeley.edu/
rg111|3 years ago
The explanation, examples, projects, math- all are crisp.
As the name suggests, it is only an introduction (unlike CLRS). And it does serve as a great beginners' book giving you proper foundation for the things that you learn and apply in the future.
One thing people complain about is it being written in R, but no serious hacker should fear R, as it can be picked up in 30 minutes, and you can implement the ideas in Python.
As someone with industry experience in Deep Learning, I will recommend this book.
The ML course by Andrew Ng has no parallel, though. One must try and do that course. Not sure about the current iteration, but the classic one (w/ Octabe/MATLAB) was really great.
bjornsing|3 years ago
kevinskii|3 years ago
ranc1d|3 years ago
KRAKRISMOTT|3 years ago
You are approaching this like an established natural sciences field where old classics = good. This is not true for ML. ML is developing and evolving quickly.
I suggest taking a look at Kevin Murphy's series for the foundational knowledge. Sutton and Barto for reinforcement learning. Mackay's learning algorithms and information theory book is also excellent.
Kochenderfer's ML series is also excellent if you like control theory and cybernetics
https://algorithmsbook.com/ https://mitpress.mit.edu/9780262039420/algorithms-for-optimi... https://mitpress.mit.edu/9780262029254/decision-making-under...
For applied deep learning texts beyond the basics, I recommend picking up some books/review papers on LLMs, Transformers, GANs. For classic NLP, Jurafsky is the go-to.
Seminal deep learning papers: https://github.com/anubhavshrimal/Machine-Learning-Research-...
Data engineering/science: https://github.com/eugeneyan/applied-ml
For speculation: https://en.m.wikipedia.org/wiki/Possible_Minds
mtlmtlmtlmtl|3 years ago
While it does cover minimax trees, alphabeta etc, it only really provides a very brief overview. The book is more of an overview of the AI/ML fields as a whole. Game playing AI is dense with various game-specific heuristics that the book scarcely mentions.
Not sure about books, but the best resource I've found on at least chess AI is chessprogramming.org, then just ingesting the papers from the field.
ipnon|3 years ago
starwind|3 years ago
mfrieswyk|3 years ago
TaupeRanger|3 years ago
rg111|3 years ago
given that:
- you already know Python/any programming language properly
- you already know college level math (many people say you don't need it, but haven't met a single soul in ML research/modelling without college level math)
- you know Stats 101 matching a good uni curriculum and ability to learn beyond
- you know git, docker, cli, etc.
Every influencer and their mother promising to teach you Data Science in 30 days are plain lying.
Edit: I see that I left out Deep RL. Let's keep it that way for now.
Edit2: Added tree based methods. These are very important. XGBoost outperforms NNs every time on tabular data. I also once used an RF head appended to a DNN, for final prediction. Added optimizers.
sillysaurusx|3 years ago
I've been doing it since early 2019 and there are still subtleties that catch me off guard. Get back to me when you're not surprised that you can get rid of biases from many layers without harming training.
I broadly agree with you, but the timeline was just a little too aggressive. By about 10x. :)
jtmcmc|3 years ago
cyber_kinetist|3 years ago
moneywoes|3 years ago
raz32dust|3 years ago
While having strong mathematical foundation is useful, I think developing intuition is even more important. For this, I recommend Andrew Ng's coursera courses first before you dive too deep.
mindcrime|3 years ago
http://codingthematrix.com/
https://www.youtube.com/playlist?list=PLEhMEyM9jSinRHXJgRCOL...
viscanti|3 years ago
nephanth|3 years ago
Also proba/statistics! Without those you can end up doing stuff pretty wrong
mfrieswyk|3 years ago
crosen99|3 years ago
The first chapter walks through a neural network that recognizes handwritten digits implemented in a little over 70 lines of Python and leaves you with a very satisfying basic understanding of how neural networks operate and how they are trained.
martythemaniak|3 years ago
But there's are both kinda old now, so there must be something newer that'll give you an equally good intro to transformers, etc.
nmfisher|3 years ago
conjectureproof|3 years ago
Here is how I used that book, starting with a solid foundation in linear algebra and calculus.
Learn statistics before moving on to more complex models (neural networks).
Start by learning ols and logistic regression, cold. Cold means you can implement these models from scratch using only numpy ("I do not understand what I cannot build"). Then try to understand regularization (lasso, ridge, elasticnet), where you will learn about the bias/variance tradeoff, cross-validation and feature selection. These topics are explained well in ESL.
For ols and logistic regression I found it helpful to strike a 50-50 balance between theory (derivations and problems) and practice (coding). For later topics (regularization etc) I found it helpful to tilt towards practice (20/80).
If some part of ESL is unclear, consult the statsmodels source code and docs (top preference) or scikit (second preference, I believe it has rather more boilerplate... "mixin" classes etc). Approach the code with curiosity. Ask questions like "why do they use np.linalg.pinv instead of np.linalg.inv?"
Spend a day or five really understanding covariance matrices and the singular value decomposition (and therefore PCA which will give you a good foundation for other more complicated dimension reduction techniques).
With that foundation, the best way to learn about neural architectures is to code them from scratch. Start with simpler models and work from there. People much smarter than me have illustrated how that can go: https://gist.github.com/karpathy/d4dee566867f8291f086 https://nlp.seas.harvard.edu/2018/04/03/attention.html
While not an AI expert, I feel this path has left me reasonably prepared to understand new developments in AI and to separate hype from reality (which was my principal objective). In certain cases I am even able to identify new developments that are useful in practical applications I actually encounter (mostly using better text embeddings).
Good luck. This is a really fun field to explore!
poulsbohemian|3 years ago
One of the problem with AI is exactly what you noted above - there are a lot of subcategories and my gut tells me these will grow. For the real neophyte, I'd say start with something that interests you or that you need for work - you likely aren't going to digest all of this in a month and probably no single book will meet all your needs.
bradreaves2|3 years ago
I adore PRML, but the scope and depth is overwhelming. LfD encapsulates a number of really core principles in a simple text. The companion course is outstanding and available on EdX.
The tradeoff is that LfD doesn't cover a lot of breath in terms of looking at specific algorithms, but your other texts will do a better job there.
My second recommendation is to read the documentation for Scikit.Learn. It's amazingly instructive and a practical guide to doing ML in practice.
vowelless|3 years ago
PartiallyTyped|3 years ago
bjornsing|3 years ago
In the opening chapter Jaynes describes a hypothetical system he calls “The Robot”. He then lays out the mathematics of the “The Robot’s” thinking in detail: essentially Bayesian probability theory. This is the best summary of an ideal ML/AI system I’ve come across. It’s also very philosophically enlightening.
sillysaurusx|3 years ago
It's a good book, but I don't know how it's related to ML. My own answer would be "Just do it." Find an ML project you like and start tinkering around. But everyone learns differently, so maybe there's a book that can replace experience.
misiti3780|3 years ago
gerash|3 years ago
Probabilistic Machine Learning: An Introduction
https://probml.github.io/pml-book/book1.html
Probabilistic Machine Learning: Advanced Topics
https://probml.github.io/pml-book/book2.html
pablo24602|3 years ago
junkerm|3 years ago
daturkel|3 years ago
https://github.com/daturkel/learning-papers
digitalsushi|3 years ago
Let me ask a slightly different way - can someone like me get into a job like these, without needing some more college?
My day job is wrapping up OS templates for people with ML software and I always wonder what they get to go do with them once they turn into a compute instance.
throwaway81523|3 years ago
It is a trendy area and in such areas there is always skepticism towards wannabe entrants. As for whether you know enough math, I would start by watching the fast.ai videos and seeing if you're comfortable with the explanations and tools.
I can say I have a stronger math background than most programmers (though less strong than that of real math geeks) and I don't think I know enough math to really grok this stuff, but I'm always after a more foundational understanding than it takes to just use the tools. I think there are opportunities that don't require the math, but are just about having gotten some practice with packages X, Y, or Z. In the end though, those are like web frameworks that become obsolete all the time. So it is worth spending time on foundations.
zmgsabst|3 years ago
Call it cross functional training to increase your domain knowledge, tell your manager you need it to ensure you’re providing the best service possible, and get your coworkers to help you learn the framework they use…?
jtmcmc|3 years ago
friendlyHornet|3 years ago
ly3xqhl8g9|3 years ago
→ Harrison Kinsley, Daniel Kukiela, Neural Networks from Scratch, https://nnfs.io, https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0Qu...
Somewhat foundational, if not in actuality, then in the intention to actually build a theory as in theory of gravitation, although not necessarily an introductory text:
→ Daniel A. Roberts, Sho Yaida, The Principles of Deep Learning Theory, https://arxiv.org/abs/2106.10165
avipeltz|3 years ago
- For deep learning specifically, a more applied text that is beautifully written and chock full of examples is Francois Chollet's Deep Learning with Python (there a new second edition out with up to date examples using modern versions of Tensorflow). The first 3 chapters I would give as required reading for anyone interested in understanding some deep learning fundamentals.
- Deep Learning - goodfellow and bengio - seems like it would be hard to get through without a reading group not exactly a APUE or K&R type reading experience but I haven't spent enough time with it.
If you haven't taken a Linear Algebra or Differential Equations class its useful stuff to know for ML/DL theory but not fully necessary to do applied work with modern high level libraries, but definitely having a strong understanding of basic matrix math is useful.
If you have interests in natural language processing theres a couple good books:
- Natural Language Processing with Python - Bird Klein, Loper, is a great intro to NLP concepts and working with NLTK which may be a bit dated to some but I would definitely recommend, and its online for free. Great examples.(https://www.nltk.org/book/)
- Speech and Language Processing - Dan Jurafsky and James H. Martin - is good, though I have only spent much time with the pre-print
And then theres a lot of papers that are good reads. Let me know if you have any questions or want a list of good papers.
If you just want to get off the ground and start playing with stuff and building things I'd recommend fast.ai's free online course - its pretty high level and a lot is abstracted away but its a great start and can enable you to build lots of cool things pretty rapidly. Andrew Ng's online course also is quite requitable and will probably give you a bit more background and fundamentals.
If I were to choose one book from the bunch it would be Chollet it gives you pretty much all the building blocks you need to be able to read some papers and try to implement things yourself and I find building things a much more satisfying way to learn than sitting down and writing proofs or just taking notes but thats just my preference.
rg111|3 years ago
And the new things he cover are covered in a better manner and better depth in other sources.
I read this book like a novel. Good for a basic overview, but the RoI is very low.
dezzeus|3 years ago
Artificial Intelligence, a modern approach – Stuart Russell, Peter Norvig
mindcrime|3 years ago
apu|3 years ago
gaspb|3 years ago
The book assumes limited knowledge (similar to what is required for Pattern Recognition I would say) and gives a good intuition on foundational principles of machine learning (bias/variance tradeoff) before delving to more recent research problems. Part I is great if you simply want to know what are the core tenets of learning theory!
sinenomine|3 years ago
Much of old theory is barely applicable and people are, understandably, bewildered and in denial.
If someone were to be inclined to theory, I'd just recommend reading papers that don't try oversimplify the domain:
https://arxiv.org/abs/2006.15191
https://arxiv.org/abs/2210.10749
https://arxiv.org/abs/2205.10343
https://arxiv.org/abs/2105.04026
stevenbedrick|3 years ago
But is also available online as a preprint here: https://mlstory.org/
master_yoda_1|3 years ago
rg111|3 years ago
Then start with ISLR.
Then go and watch Andrew Ng Machine Learning course on Coursera (a new version was added in 2022 that uses Python).
Then read the sklearn book from its maintainers/core devs. It's from O'Reilly.
Then go do the Deep Learning Specialization from deeplearning.ai.
Then do fast.ai course.
If interested in Deep RL, watch David Silver lectures, then read Deep RL in Action by Zai, Brown. Then do the HF course on Deep RL.
This is how you get started. Choose your books based on your personality, needs, and contents covered.
And among MOOCs, I highly suggest the one by Canziani, LeCun from NYU. (I loved the 2020 version.)
The one taught by Fei Fei Li and Andrej Karpathy is nice.
These two MOOCs can substitute classic books based on quality.
I have never read cover to cover any of the famous books. I read a lot from them sticking to specific subjects.
Get to reading papers, finding implementations. Ng + ISLR will give you good grounds. Fast.ai + deeplearning.ai will give you capability to solve real problems. NYU + Tubingen + Stanford + UMich (Justin Johnson) courses will bring you to the edge.
You need a lot of practical experience that aren’t taught anywhere. So, get your hands dirty early. Learn to use frameworks, cloud platforms, etc.
Then start reading papers.
A crystal clear grasp on Math foundations is a must. Get it if you don't have already.
pkoird|3 years ago
ipnon|3 years ago
IanCal|3 years ago
Now I think you've got key parts. There's how to use recent production ready models/systems, how to train them and how to make them. Is it in a research or business context?
The field is also broad enough that any one section (text, images, probably symbols) and subsection (time series, bulk, fast online work) all have significant bodies of work behind them. My splits here will not be the best currently so I'm happy for any corrections on a useful hierarchy by the way.
Perhaps you're interested in the history and what's led up to today's work? That's more of a "brief history of time" style coverage, but illuminating.
I'm aware I've not helpfully answered, but I think the same question could have very different valid goals and wanted to bring that to the fore.
robg|3 years ago
https://psycnet.apa.org/record/1988-97441-000
rramadass|3 years ago
Any more recommendations?
PS: You might find Vehicles: Experiments in Synthetic Psychology by Valentino Braitenberg interesting if you don't already know of it.
throwaway81523|3 years ago
https://www.cs.cornell.edu/jeh/book%20no%20so;utions%20March...
Also in published form from Cambridge University Press:
https://www.cambridge.org/core/books/foundations-of-data-sci...
jgrimm|3 years ago
For example nearly everyone understands how to apply multivariable logistic regression, in say Numpy, however a good grasp of underlying concepts such as confidence bounds for overfitting and and being able to use formal proofs to explain concepts such as VC Generalisation will both help you stand out and provide a good foundation that makes further learning much easier.
cscurmudgeon|3 years ago
https://math.mit.edu/~gs/learningfromdata/
5cott0|3 years ago
adg001|3 years ago
Understanding Machine Learning: From Theory To Algorithms – Shai Shalev-Shwartz
dmarcos|3 years ago
zffr|3 years ago
I have not applied this technique to AI/ML/NN specifically, but it has been useful for me when trying to learn other topics.
epgui|3 years ago
dceddia|3 years ago
The authors are working on a new course that’ll dive deep into the modern Stable Diffusion stuff too, which I’m looking forward to.
cttet|3 years ago
alphabetting|3 years ago
6gvONxR4sf7o|3 years ago
davidhunter|3 years ago
This is a good overview of the history of the field (up to SVMs and before deep NNs). I found this useful for putting all the different approaches into context.
bilsbie|3 years ago
I’m having trouble keeping my motivation up but I really want to get up to speed on how LLM’s work and someday make a career switch.
moneywoes|3 years ago
PartiallyTyped|3 years ago
You'd need the following background:
- Linear Algebra
- Multivariate Calculus
- Probability theory && Statistics
Then you need a decent ML book to get the foundations of ML, you can't go wrong with either of these:
- Bishop's Pattern Recognition
- Murphy's Probabilistic ML
- Elements of statistical learning
- Learning from data
You can supplement Murphy's with the advanced book. Elements is a pretty tough book, consider going through "Introduction to statistical learning"[1]. Bishop and Murphy include foundational topics in mathematics.
LfD is a great introductory book and covers one of the most important aspects of ML, that is, model complexity and families of models. It can be supplemented with any of the other books.
I'd also recommend doing some abstract algebra, but it's not a prerequisite.
If you would like a top-down approach, I recommend getting the book "Mathematics of Machine Learning" and learning as needed.
For NN methods, some recommendations:
- https://paperswithcode.com/methods/category/regularization
- https://paperswithcode.com/methods/category/stochastic-optim...
- https://paperswithcode.com/methods/category/attention-mechan...
- https://paperswithcode.com/paper/auto-encoding-variational-b...
For something a little bit different but worth reading given that you have the prerequisite mathematical maturity
- Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges | https://arxiv.org/abs/2104.13478
[1] https://www.statlearning.com/
Many thanks to the user "mindcrime" for catching my error with Introduction to statistical learning.
mindcrime|3 years ago
Was that supposed to be An Introduction to Statistical Learning[1] or maybe Introduction to Statistical Relational Learning[2]? I don't think there is a book titled Introduction to Elements of Statistical Learning?
[1]: https://www.statlearning.com/
[2]: https://www.cs.umd.edu/srl-book/
sillysaurusx|3 years ago
antegamisou|3 years ago
nephanth|3 years ago
jpamata|3 years ago
revskill|3 years ago