top | item 18840747

Hacker's guide to Neural Networks (2012)

383 points| headalgorithm | 7 years ago |karpathy.github.io | reply

24 comments

order
[+] theCricketer|7 years ago|reply
Since we're on the topic of tutorials to understand neural nets and modern deep learning, I will throw in Michael Nielsen's excellently written free online "book" on neural nets. It's really a set of 6 long posts that gets you from 0 to understanding all of the fundamentals with almost no prerequisite math needed.

Using clear and easy to understand language, Michael explains neural nets, the backprop algorithm, challenges in training these models, some commonly used modern building blocks and more:

http://neuralnetworksanddeeplearning.com/

This book opened my eyes to the power of textbooks written in such easy to understand, clear style. Bet it took repeated revisions, incorporating feedback from others and hours of work but such writing is a huge value add to the world.

[+] jeraguilon|7 years ago|reply
Great post that I often go back on. A curious fact about Karpathy is that he actually has a long history of teaching (relative to his age). About 9 years ago, I learned how to speed solve Rubik's cubes in ~12 seconds through his YouTube channel [0]. It's interesting to see that his simple teaching style transfers quite well to more technical topics than twisty puzzles.

[0] https://www.youtube.com/user/badmephisto

[+] Rainymood|7 years ago|reply
WHAT?!

Badmephisto == Andrej Karpathy?!

I would've never made the connection ... badmephisto also got me into speedcubing, my pb is ~14 sec, crazy ...

[+] freediver|7 years ago|reply
From a student to the director of AI at one of the most innovative companies on Earth in, what, 4 years? Must be one of the greatest untold stories.
[+] jaimex2|7 years ago|reply
The story will be told soon enough once his work has changed the world, it's very much on its first chapters I would imagine.
[+] sdan|7 years ago|reply
Any new updates to this article? This guide is well known by ML practitioners for a while now.
[+] otaviogood|7 years ago|reply
I think this eventually turned into Andrej Karpathy's class at Stanford, CS231n. The class notes are here: http://cs231n.github.io/ The class is on youtube. If you like this hacker's guide, I think you'll definitely like the class and the notes. edit: A lot of the compute graph and backprop type stuff that is in the hacker's guide is covered in this specific class, starting about at this time: https://www.youtube.com/watch?v=i94OvYb6noo&t=207s
[+] peteretep|7 years ago|reply
I've made a lot of progress in my mental models recently by implementing the perceptron in Excel
[+] gnulinux|7 years ago|reply
I've made a similar progress by implementing some ML algorithms in pure RISC-V assembly. Makes you think.
[+] amelius|7 years ago|reply
There's a lot of high-school math there, but the trouble is that the real workings of neural networks (the speed of convergence, and why/if it works on samples outside the training/validation set) are left a mystery, if you ask me.
[+] orbifold|7 years ago|reply
It is relatively clear why it works beyond the training and validation set: What is being approximated is a smooth function, which in the case of a classification task is a function from the space of things to be classified (images of a certain size) to the n-simplex, where n is the number of classes. Then the preimage theorem tells you that over a regular point of this smooth map lies a codimension n submanifold in the space of things to be classified. That in turn can be interpreted as the submanifold of all things that look like the class you are assigning, especially close to the corners of the n-simplex (being a regular point is an open condition). In short: Because the map is constructed to be smooth it will make sense beyond whatever the training / validation data was. Note that this does not guarantee that it has learned something reasonable about the dataset, just that it will have found some way to smoothly separate it into different components.
[+] taneq|7 years ago|reply
General popular opinion seems to be that these are (to greater or lesser extent) a mystery for everyone. Can you suggest any intermediate reading on things like generalisation? I've looked online but only found either "here's how to recognize numbers in the MNIST dataset using numpy" or "First we take <long string of squiggles> which trivially implies <longer string of squiggles>..."
[+] yumraj|7 years ago|reply
That is because hyperparameters are indeed a mystery and the fine tuning of those is more an heuristics based art, than a science.
[+] lettergram|7 years ago|reply
I actually wrote a similar style guide just last week (had some time off, was meaning to do it):

https://austingwalters.com/neural-networks-to-production-fro...

Uses updated Keras and python and doesn’t go so much into the network connections itself.

I do regular trainings and teach seminars on neural networks and find most tutorials online go too in-depth regarding constructing a network from scratch (such as this one) - they lose people.

The biggest issue is actually data formatting, ingestion then hyperparameter tuning today. You really only need to grasp the basics to get started in 2019.