A week-long programming retreat

[+] qychtkd|8 years ago|reply

Master of Doom contains an anecdote on this aspect of John Carmack's life. On page 252, it mentioned how he had sequestered himself in a "small, anonymous hotel room somewhere in Florida" as he researched Trinity. He had a dolch portable computer with pentium II and full length PCI slots, while subsisting only on pizza and Diet Coke. That bit for some reason made a big impression on me when I read it on the bus ride to school. To be able to let yourself go and research and code what you truly believe in or curious or excited about (w/ room service and not have to clean up after yourself haha) seemed incredible. I wonder if John still sticks to Florida, or if he goes to different places each year; in a city or just a hotel off of a highway or near an airport. My favorites have been Hyatt - Place Amsterdam Airport, - Regency Charles De Gaulle, and - Lake Tahoe. Something about sterile rooms, room service, a hotel near, but not too close to beautiful and historical landmarks just center you and allow you to think.

[+] yitchelle|8 years ago|reply

Agreed, I guess that the trick to selecting a location is to be in a location the the inside the hotel/cabin room is much more attractive that on the outside.

I had a colleague that told me that his most productive period was when he was stuck at the hospital for a couple of weeks but was able to do some coding.

[+] United857|8 years ago|reply

Good writeup -- and one of the main reminders for me is this:

People throw around words like "revolution" for the current deep-learning push. But it's worth remembering that the fundamental concepts of neural networks have been around for decades. The current explosion is due to breakthroughs in scalability and implementation through GPUs and the like, not any sort of fundamental algorithmic paradigm shift.

This is similar to how the integrated circuit enabled the personal computing "revolution" but down at the transistor level, it's still using the same principles of digital logic since the 1940s.

[+] Iv|8 years ago|reply

In computer vision at least, deep learning has been a revolution. More than half of what I knew in the field became obsolete almost overnight (it took about a year or two I would say) and a lot of tasks received an immediate boost in term of performances.

Yes, neural networks have been here for a while, gradually improving, but they were simply non-existent in many fields where they are now the favored solution.

There WAS a big fundamental paradigm shift in algorithmic. Many people argue that it should not be called "neural networks" but rather "differentiable functions networks". DL is not your dad's neural network, even if it looks superficially similar.

The shift is that now, if you can express your problem in terms of minimization of a continuous function, there is a new whole zoo of generic algorithms that are likely to perform well and that may benefit from throwing more CPU resources.

Sure it uses transistors in the end, but revolutions do not necessarily mean a shift in hardware technology. And, by the way, if we one day switch from transistors to things like opto-thingies, if it brings a measely 10x boost on performances, it won't be on par with the DL revolution we are witnessing.

[+] The_suffocated|8 years ago|reply

I don't think the revolution was about hardware improvement. I did some neural network research (and published a few papers) in the 1990s and switched to other research disciplines afterwards. So, I'm not really familiar with the recent developments. But to my knowledge, there was indeed a revolution in neural network research. It was about how to train a DEEP neural network.

Traditionally, neural networks were trained by back propagations, but many implementations had only one or two hidden layers because training a neural network with many layers (it wasn't called "deep" NN back then) was not only hard, but often led to poorer results. Hochreiter identified the reason in his 1991 thesis: it is because of the vanishing gradient problem. Now the culprit was identified but the solution had yet to be found.

My impression is that there weren't any breakthroughs until several years later. Since I'd left that field, I don't know what exactly these breakthroughs were. Apparently, the invention of LTSM networks, CNNs and the replacement of sigmoids by ReLUs were some important contributions. But anyway, the revolution was more about algorithmic improvement than the use of GPUs.

[+] jorgemf|8 years ago|reply

> not any sort of fundamental algorithmic paradigm shift

I think I cannot agree with this. There has been a lot of improvements to the algorithms to solve problems and the pace has speed up thanks to GPUs. You just cannot make a neural network from 15 years ago bigger and think it is going to work with modern GPUs, it is not going to work at all. Moreover, new techniques have appeared to solve other type of problems.

I am talking about things like batch normalization, RELUs, LSTMs or GANs. Yes, neural networks still use gradient descent, but there are people working on other algorithms now and they seem to work but they are just less efficient.

> This is similar to how the integrated circuit enabled the personal computing "revolution" but down at the transistor level, it's still using the same principles of digital logic since the 1940s.

This claim has exactly the same problem as before. You can also say evolution has done nothing because the same principles that are in people they were there with the dinosaurs and even with the first cells. We are just a lot more cells than before.

[+] vadansky|8 years ago|reply

There was a revolution, when they started using backpropogation to optimize the gradient search. It's also why I don't agree with calling them "neural" anything because there is no proof brains learn using backpropogation. I feel like current direction threw away all neurobiology and focus too much on the mathematical.

[+] latenightcoding|8 years ago|reply

New regularization techniques to stop these massive NNs from overfitting is also one of the main factors for the current deep learning explosion.

[+] jakecrouch|8 years ago|reply

The best way to advance AI is probably to make the hardware faster, especially now that Moore's Law is in danger of going away. The people doing AI research generally seem to be fumbling around in the dark, but you can at least be certain that better hardware would make things easier.

[+] rhacker|8 years ago|reply

It's actually super refreshing learning that even programming masters like Carmack are just now learning NNs and watching Youtube Stanford classes like the rest of us. These are actual people, not gods :) Everybody poops!

[+] tostitos1979|8 years ago|reply

Yeah .. but his first impulse was to write backprop from scratch. I saw the lectures, been dabbling with NN for years, and I never thought to do it. I always thought the Stanford people made you do it on assn 1 to pay your dues or something. I continue to think of Carmack as the Master hacker.

[+] r00k|8 years ago|reply

I do trips like this every few years. I can hardly recommend them enough.

I wrote about my specific recommendations here: https://robots.thoughtbot.com/you-should-take-a-codecation.

[+] Cthulhu_|8 years ago|reply

I should do that sometime. I'm afraid I've lost my ability to focus, something like that might help. I'd need a solid mission though, instead of a "look into tech X" without a goal.

[+] justonepost|8 years ago|reply

Pretty awesome! If I ever had to say the one thing that differentiates successful people from unsuccessful people it wouldn't be intelligence, or even perseverance, or passion. It'd be focus. With focus, you can be amazingly successful in so many types of occupations.

(That being said, passion / perseverance / intelligence can often lead to focus)

[+] dominotw|8 years ago|reply

> (That being said, passion / perseverance / intelligence can often lead to focus)

Arent those more of a prerequisites for focus? Focus is merely a side effect not something you can aim for.

[+] michaelmcmillan|8 years ago|reply

IQ tests seem to suggest otherwise.

[+] arnioxux|8 years ago|reply

I second his recommendation of CS231N: http://cs231n.stanford.edu/

You can probably go through the whole thing including assignments in under a week full-time.

[+] colmvp|8 years ago|reply

Personally, I found it took me much longer. I watched Karpathy's lectures, took notes and stewed upon the ideas, and read a bunch of other materials such as blog posts and research papers to try and truly comprehend some of the concepts mentioned in the course.

I found myself knowing how to create CNN's, but the why of the entire process still feels under-developed. But I'll admit it could be because my understanding of Calc and Linear Algebra was far more under-developed back when I was studying the course than it is now.

[+] IamNotAtWork|8 years ago|reply

Are these courses being taught by graduate students? The three main instructors seem like they are students themselves, with an army of under and grad TAs.

Pity that you pay so much money to attend Stanford only to be taught by your peers. Not knocking on Stanford as this is how is being done much everywhere in the undergrad level now.

[+] plg|8 years ago|reply

The idea of programming something from "scratch" (whatever your definition, and programming language) is the best way to really understand something new. Reading about it, hearing someone speak about it is one thing ... but opening up a blank .c file and adopting a "ok, let's get on with this" approach is something much different.

It takes time though and one has to combat the "how come you're reinventing the wheel" comments from co-workers, spouses, bosses, etc., which can be a challenge.

[+] BenoitEssiambre|8 years ago|reply

"I initially got backprop wrong both times, comparison with numerical differentiation was critical! It is interesting that things still train even when various parts are pretty wrong — as long as the sign is right most of the time, progress is often made."

That is the bane of doing probabilistic code. Errors show up not as clear cut wrong values or crashes but as subtle biases. You are always wondering, even when it is kinda working, is it REALLY working or did I miss a crucial variable initialization somewhere?

[+] gwern|8 years ago|reply

There might be something deeper there. I am thinking of the line of research, associated with Bengio, about biologically-plausible backprop - it turns out that you can backprop random weights and backprop will still work! Which is important because it's not too plausible that the brain is calculating exact derivatives and communicating them around to each neuron individually to update them, but it can send back error more easily.

[+] hota_mazi|8 years ago|reply

It's actually not very different from graphic programming, where a simple rounding error can cause all kinds of troubles, from very small (surfaces or ray not reflecting perfectly where they should be) or very big (completely messing your entire rendering).

Both activities very much resemble chaotic systems and they are both very challenging to debug.

[+] hotmilk|8 years ago|reply

Reproducing known results goes a long way.

[+] dopeboy|8 years ago|reply

The recurse center (formerly known as Hackerschool) is offering one week mini retreats (their typical program is months long). I applied and didn't get in but after reading this, I may try again.

https://www.recurse.com/blog/127-a-new-way-to-join-the-rc-co...

[+] twic|8 years ago|reply

Reminds me of /dev/fort:

https://devfort.com/

[+] codezero|8 years ago|reply

You should definitely reapply! I went and it was a blast.

[+] tinderliker|8 years ago|reply

>On some level, I suspect that Deep Learning being so trendy tweaked a little bit of contrarian in me, and I still have a little bit of a reflexive bias against “throw everything at the NN and let it sort it out!”

I am the same kind of person. But when John Carmack approaches this with scepticism and concludes that it is indeed not over-hyped, I guess its worth learning after all!

CS231N, here I come.

[+] tail-recursion|8 years ago|reply

He never said it is not over-hyped. Most people who work in the field think it is over-hyped.

[+] ryandrake|8 years ago|reply

I admit to having the same grouchy thoughts about machine learning and AI: When is this fad going to blow over and we can all get back to writing deterministic programs instead of collecting data and training models, or whatever it is this new breed is doing? Might be time to rethink and revive my curiosity.

[+] deepaksurti|8 years ago|reply

>> I’m not a Unix geek. I get around ok, but I am most comfortable developing in Visual Studio on Windows.

This is a lesson for me and probably many others. Don't get hung up on tools, ship!!!

[+] pjmlp|8 years ago|reply

That attitude is quite prevalent on the games industry, hence anyone that would spend some time among AAA devs would learn that they don't care one second about stuff like portable 3D APIs and OSes, as people in forums like HN think they do.

What matters is making a cool game with whatever tech is available and ship.

[+] munificent|8 years ago|reply

I think the subtext of your comment here is something like, "Look how productive he is without mastering the real tools we Unix hackers use! He can get by with second rate Windows stuff!"

Maybe I'm reading you wrong. But if I am right, it's good to take a look outside of the Unix bubble. Visual Studio is literally the world's most sophisticated developer tool. More human hours of engineering have been poured into it than likely any other piece of software we use on a daily basis.

Windows isn't my jam, but VS is incredible.

[+] waylandsmithers|8 years ago|reply

Here here. Junior webdevs will often consider the use of vim and other command line tools (over GUI based tools or IDEs) a badge of honor.

[+] pm|8 years ago|reply

I find this interesting since I used to get so excited as a teenager when one of the id peeps updated their .plan file.

[+] cheschire|8 years ago|reply

Yeah that was the point that stuck out for me the most as well.

He took a week off to focus on a problem set, and ended up spending a number of hours working around the limitations of his setup instead.

I mean, I don't want to sound judgmental. Perhaps that was part of his plan. It just stuck out to me as seeming orthogonal to his expressed goals.

[+] cluoma|8 years ago|reply

If anybody else is interested in doing something similar, I highly recommend Michael Nielsen's online book: Neural Networks and Deep Learning[1]. He gives really good explanations and some code examples.

I ended up writing the most basic feed-forward network in C[2]; although I didn't use base libs like Carmack :(

[1] http://neuralnetworksanddeeplearning.com/chap1.html

[2] https://github.com/cluoma/nn_c2

[+] doomlaser|8 years ago|reply

I like his insight that a lot of base neural network implementation code is conceptually simple in the same way as writing a raytracer.

[+] gonehome|8 years ago|reply

I think the point about NN and ray tracing being simple systems that allow for complex outcomes is something that seems to be a deeper truth about the universe. Stephen Wolfram and Max Tegmark both talk and write about this - it also shows up in old cellular automata like Conway’s game of life.

It’s pretty cool that so much complexity can come from a few small rules or equations.

[+] Cieplak|8 years ago|reply

Interesting to hear that C++ isn’t that well supported on OpenBSD. The story is quite the opposite with FreeBSD, where it’s really easy to use either clang or gcc. I usually spin up new jails to keep different installations sandboxed. CLion works quite nicely with most window managers on FreeBSD, but I rarely boot Xwindows these days and usually prefer to work with emacs inside tmux from the console.

[+] TheAceOfHearts|8 years ago|reply

I find it very admirable that he can sit down for a week and just focus on one main subject. Personally, I get derailed all the time when I don't have a very well defined goal in mind.

It looks a little something like this: I'll be reading a manpage and notice another manpage referenced at the bottom. So I obviously keep crawling this suggestions tree until I bump into a utility whose purpose is unclear. So then I'll go searching online to try and figure out what kinds of problems or use-cases it's meant to help with.

[+] abenedic|8 years ago|reply

He mentions it a bit, but openbsd is really a good place to start for multiplatform code. Their focus on security and posix helps a ton.

[+] blt|8 years ago|reply

This post made me appreciate being a graduate student a lot. I have many weeks like this!

[+] bsenftner|8 years ago|reply

I've structured my life like this. Remote employer in a time zone 10 hours away, I get months long tasks I simply update boss on progress and am left to focus so deep my health suffers because I naturally obsess over my work, which I love. And working from home, my wife also works from home (freelance film producer), so we just immerse ourselves, have meals together, but otherwise we both obsess over our work. No commuting makes this very enjoyable.

[+] z3phyr|8 years ago|reply

I really like the attitude of picking a topic at hand and hack around it for fun. It radiates a very MIT Hacker feel. The writeup is very motivating.

John has been experimenting with a lot of stuff -- Racket, Haskell, Computer Vision and now Neural Networks. I guess there is no professional intent, but the spirit of hacking lives on.

[+] forgotmypw|8 years ago|reply

No-JS link:

https://m.facebook.com/permalink.php?story_fbid=211040872252...

John Carmack

After a several year gap, I finally took another week-long programming retreat, where I could work in hermit mode, away from the normal press of work. My wife has been generously offering it to me the last few years, but I’m generally bad at taking vacations from work.

As a change of pace from my current Oculus work, I wanted to write some from-scratch-in-C++ neural network implementations, and I wanted to do it with a strictly base OpenBSD system. Someone remarked that is a pretty random pairing, but it worked out ok.

Despite not having actually used it, I have always been fond of the idea of OpenBSD — a relatively minimal and opinionated system with a cohesive vision and an emphasis on quality and craftsmanship. Linux is a lot of things, but cohesive isn’t one of them.

I’m not a Unix geek. I get around ok, but I am most comfortable developing in Visual Studio on Windows. I thought a week of full immersion work in the old school Unix style would be interesting, even if it meant working at a slower pace. It was sort of an adventure in retro computing — this was fvwm and vi. Not vim, actual BSD vi.

In the end, I didn’t really explore the system all that much, with 95% of my time in just the basic vi / make / gdb operations. I appreciated the good man pages, as I tried to do everything within the self contained system, without resorting to internet searches. Seeing references to 30+ year old things like Tektronix terminals was amusing.

I was a little surprised that the C++ support wasn’t very good. G++ didn’t support C++11, and LLVM C++ didn’t play nicely with gdb. Gdb crashed on me a lot as well, I suspect due to C++ issues. I know you can get more recent versions through ports, but I stuck with using the base system.

In hindsight, I should have just gone full retro and done everything in ANSI C. I do have plenty of days where, like many older programmers, I think “Maybe C++ isn’t as much of a net positive as we assume...”. There is still much that I like, but it isn’t a hardship for me to build small projects in plain C.

Maybe next time I do this I will try to go full emacs, another major culture that I don’t have much exposure to.

I have a decent overview understanding of most machine learning algorithms, and I have done some linear classifier and decision tree work, but for some reason I have avoided neural networks. On some level, I suspect that Deep Learning being so trendy tweaked a little bit of contrarian in me, and I still have a little bit of a reflexive bias against “throw everything at the NN and let it sort it out!”

In the spirit of my retro theme, I had printed out several of Yann LeCun’s old papers and was considering doing everything completely off line, as if I was actually in a mountain cabin somewhere, but I wound up watching a lot of the Stanford CS231N lectures on YouTube, and found them really valuable. Watching lecture videos is something that I very rarely do — it is normally hard for me to feel the time is justified, but on retreat it was great!

I don’t think I have anything particularly insightful to add about neural networks, but it was a very productive week for me, solidifying “book knowledge” into real experience.

I used a common pattern for me: get first results with hacky code, then write a brand new and clean implementation with the lessons learned, so they both exist and can be cross checked.

I initially got backprop wrong both times, comparison with numerical differentiation was critical! It is interesting that things still train even when various parts are pretty wrong — as long as the sign is right most of the time, progress is often made.

I was pretty happy with my multi-layer neural net code; it wound up in a form that I can just drop it into future efforts. Yes, for anything serious I should use an established library, but there are a lot of times when just having a single .cpp and .h file that you wrote ever line of is convenient.

My conv net code just got to the hacky but working phase, I could have used another day or two to make a clean and flexible implementation.

One thing I found interesting was that when testing on MNIST with my initial NN before adding any convolutions, I was getting significantly better results than the non-convolutional NN reported for comparison in LeCun ‘98 — right around 2% error on the test set with a single 100 node hidden layer, versus 3% for both wider and deeper nets back then. I attribute this to the modern best practices —ReLU, Softmax, and better initialization.

This is one of the most fascinating things about NN work — it is all so simple, and the breakthrough advances are often things that can be expressed with just a few lines of code. It feels like there are some similarities with ray tracing in the graphics world, where you can implement a physically based light transport ray tracer quite quickly, and produce state of the art images if you have the data and enough runtime patience.

I got a much better gut-level understanding of overtraining / generalization / regularization by exploring a bunch of training parameters. On the last night before I had to head home, I froze the architecture and just played with hyperparameters. “Training!” Is definitely worse than “Compiling!” for staying focused.

Now I get to keep my eyes open for a work opportunity to use the new skills!

I am dreading what my email and workspace are going to look like when I get into the office tomorrow.

[+] bringtheaction|8 years ago|reply

Better non-JS link: http://archive.is/MvKHy. This link is of a snapshot of the DOM of rendered page. The one you linked has a not-so-comfortable layout due to being made for old phones with small screens.

[+] Ono-Sendai|8 years ago|reply

I've also taken the time to implement a NN in C++ and train it on the MNIST handwriting data. It's a lot of fun :) As a result I have some pretty fast CPU NN code lying around.

154 comments