Ask HN: What were the papers on the list Ilya Sutskever gave John Carmack?
396 points| alan-stark | 3 years ago
“So I asked Ilya Sutskever, OpenAI’s chief scientist, for a reading list. He gave me a list of like 40 research papers and said, ‘If you really learn all of these, you’ll know 90% of what matters today.’ And I did. I plowed through all those things and it all started sorting out in my head.”
What papers do you think were on this list?
[1] https://dallasinnovates.com/exclusive-qa-john-carmacks-different-path-to-artificial-general-intelligence/
dang|3 years ago
John Carmack’s ‘Different Path’ to Artificial General Intelligence - https://news.ycombinator.com/item?id=34637650 - Feb 2023 (402 comments)
sillysaurusx|3 years ago
RIP. If it's any consolation, it sounds like the list is at least three years old by now. Which is a long time considering that 2016 is generally regarded as the date of the deep learning revolution.
pengaru|3 years ago
In my experience when it comes to learning technical subjects from a position of relative total ignorance, it's the older resources that are the easiest to bootstrap knowledge from. Then you basically work your way forward through the newer texts, like an accelerated replay of a domain's progress.
I think it's kind of obvious that this would be the case when you think about it. Just like how history textbooks can't keep growing in size to give all past events an equal treatment, nor can technical references as a domain matures.
You're forced to toss out stuff deemed least relevant to today, and in technical domains that's often stuff you've just started assuming as understood by the reader... where early editions of a new space would have prioritized getting the reader up to speed in something totally novel to the world.
moglito|3 years ago
I thought it was 2012, when AlexNet took the imagenet crown?
vtantia|3 years ago
mellosouls|3 years ago
querez|3 years ago
sho_hn|3 years ago
So, slavery?
hosolmaz|3 years ago
i_s|3 years ago
aj7|3 years ago
klabb3|3 years ago
robotburrito|3 years ago
optimalsolver|3 years ago
I would've hoped he'd be exploring weirder alternatives off the beaten path. I mean, neural networks might not even be necessary for AGI, but no one at OpenAI is going to tell Carmack that.
GuB-42|3 years ago
Otherwise you may end up walking the ditch beside the beaten path. It is slow and difficult, but it won't get you anywhere new.
For example, you may try an approach that doesn't look like deep learning, but after a lot of work, realize that you actually reinvented deep learning, poorly. We call these things neurons, transformers, backpropagation, etc... but in the end, it is just maths. If you end up finding that your "alternative" ends up being very well suited to linear algebra and gradient descent, once you have found the right formulas, you may realize that they are equivalent to the ones used in traditional "deep learning" algorithms. It help to recognize this early and take advantage of all the work done before you.
mindcrime|3 years ago
I mean, any idiot can go off-trail and start blundering around in the weeds, and ultimately wind up tripping, falling, hitting their head on a rock, and drowning to death in a ditch. But actually finding a new, better, more efficient path probably involves at least some understanding of the status quo.
ly3xqhl8g9|3 years ago
It would have been interesting seeing someone like Carmack going in this direction, but from the little details he gave he seems less interested in cells and Kjeldahl flasks and more of the same type-a-type-a on the ol' QWERTY.
† 'simply' might involve multiple decades of research and Buffett knows how many billions
[1] Human neurons implanted in mice influence behavior, https://www.nature.com/articles/s41586-022-05277-w
pavon|3 years ago
ramraj07|3 years ago
Neuralink is the only place where this pattern seemed to break a bit but then seems like Elon came into his own path with trying to push for faster results and breaking basic ethics.
sinenomine|3 years ago
The denial of obviously fertile paradigm feels like such a useless self-defeating loss to indulge in an intellectual status game.
We could be all better off right now if connectionists were given DOE-grade supercomputers in the 90s, and were supplied with custom TPUs later in the 00s as their ideas were proven generally correct via rigorous experimentation on said DOE supercomputers. This didn't happen due to what amounts to academic bullying culture: https://en.wikipedia.org/wiki/Perceptrons_(book)
The sheer scale of cumulative losses we suffered (at least in part) due to this denial of the connectionism as a generally useful foundational field will be estimated somewhere in the astronomical powers of ten in the future, where the fruits of this technology will provide radically better lives for us and our descendants.
I see you have a knee-jerk reaction to hype and industry, and we are all fearing replacement unless its a stock market doing the work for us ... but why do you feel the need to punch down at this prosaic field "about nonlinear optimization"? The networks in question just want to learn, and to help us, if we train them to this end - and we make any and all excuses to avoid receiving this help, as our civilization quietly drowns in its own incompetency...
throwaway4837|3 years ago
albertzeyer|3 years ago
Of course, there are a group of people defending the symbolic computation, e.g. see Gary Marcus, and always pushing back on connectionism (neural networks).
But this is somewhat a spectrum, or also rather sloppy terminology. Once you go away from symbolic computation, many things can be interpret as neural network. And there is also all the computational neuroscience, which also work with some variants of neural networks.
And there is the human brain, which demonstrates, that a neural network is capable of doing AGI. So why would you not want a neural network? But that does not say that you can do many things very different from mainstream.
chrgy|3 years ago
AlexNet (2012) VGGNet (2014) ResNet (2015) GoogleNet (2015) Transformer (2017) Reinforcement Learning:
Q-Learning (Watkins & Dayan, 1992) SARSA (R. S. Sutton & Barto, 1998) DQN (Mnih et al., 2013) A3C (Mnih et al., 2016) PPO (Schulman et al., 2017) Natural Language Processing:
Word2Vec (Mikolov et al., 2013) GLUE (Wang et al., 2018) ELMo (Peters et al., 2018) GPT (Radford et al., 2018) BERT (Devlin et al., 2019)
loveparade|3 years ago
I'm very confident that this is pretty much what any researcher, including Ilya, would recommend. It really isn't hard to find those resources, they are simply the most cited papers. Of course you can go deeper into any of the subfields if you desire.
ilaksh|3 years ago
But AGI is one of those very ambiguous terms. For many people it's either an exact digital replica of human behavior that is alive, or something like a God. I think it should also apply to general purpose AI that can do most human tasks in a strictly guided way, although not have other characteristics of humans or animals. For that I think it can be built on advanced multimodal transformer-based architectures.
For the other stuff, it's worth giving a passing glance to the fairly extensive amount of research that has been labeled AGI over the last decade or so. It's not really mainstream except maybe the last couple of years because really forward looking people tend to be marginalized including in academia.
https://agi-conf.org
Looking forward, my expectation is that things like memristors or other compute-in-memory will become very popular within say 2-5 years (obviously total speculation since there are no products yet that I know of) and they will be vastly more efficient and powerful especially for AI. And there will be algorithms for general purpose AI possibly inspired by transformers or AGI research but tailored to the new particular compute-in-memory systems.
TimPC|3 years ago
mirekrusin|3 years ago
jimmySixDOF|3 years ago
Strikes me as the kind of thing where that last 10% will need 400 papers
mindcrime|3 years ago
michpoch|3 years ago
tikhonj|3 years ago
swyx|3 years ago
codeviking|3 years ago
hexhowells|3 years ago
dev_0|3 years ago
[deleted]
albertzeyer|3 years ago
On models: Obviously, almost everything is Transformer nowadays (Attention is all you need paper). However, I think to get into the field, to get a good overview, you should also look a bit beyond the Transformer. E.g. RNNs/LSTMs are still a must learn, even though Transformers might be better in many tasks. And then all those memory-augmented models, e.g. Neural Turing Machine and follow-ups, are important too.
It also helps to know different architectures, such as just language models (GPT), attention-based encoder-decoder (e.g. original Transformer), but then also CTC, hybrid HMM-NN, transducers (RNN-T).
Some self-promotion: I think my Phd thesis does a good job on giving an overview on this: https://www-i6.informatik.rwth-aachen.de/publications/downlo...
Diffusion models is also another recent different kind of model.
Then, a separate topic is the training aspect. Most papers do supervised training, using cross entropy loss to the ground-truth target. However, there are many others:
There is CLIP to combine text and image modalities.
There is the whole field on unsupervised or self-supervised training methods. Language model training (next label prediction) is one example, but there are others.
And then there is the big field on reinforcement learning, which is probably also quite relevant for AGI.
hardware2win|3 years ago
Will receive Turing Award
It is being cited often
alan-stark|3 years ago
unknown|3 years ago
[deleted]
klaussilveira|3 years ago
sebkomianos|3 years ago
polskibus|3 years ago
jranieri|3 years ago
arbuge|3 years ago
KRAKRISMOTT|3 years ago
fnordpiglet|3 years ago
EvgeniyZh|3 years ago
seydor|3 years ago
unknown|3 years ago
[deleted]
unknown|3 years ago
[deleted]
username3|3 years ago
mirekrusin|3 years ago
winwhiz|3 years ago
https://twitter.com/id_aa_carmack/status/1241219019681792010
throwaway4837|3 years ago
cloudking|3 years ago
daviziko|3 years ago
Phil_Latio|3 years ago
evc123|3 years ago
adt|3 years ago
unknown|3 years ago
[deleted]
vikashrungta|3 years ago
Unlocking the Secrets of AI: A Journey through the Foundational Papers by @vrungta (2023)
1. "Attention is All You Need" (2017) - https://arxiv.org/abs/1706.03762 (Google Brain) 2. "Generative Adversarial Networks" (2014) - https://arxiv.org/abs/1406.2661 (University of Montreal) 3. "Dynamic Routing Between Capsules" (2017) - https://arxiv.org/abs/1710.09829 (Google Brain) 4. "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks" (2016) - https://arxiv.org/abs/1511.06434 (University of Montreal) 5. "ImageNet Classification with Deep Convolutional Neural Networks" (2012) - https://papers.nips.cc/paper/4824-imagenet-classification-wi... (University of Toronto) 6. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" (2018) - https://arxiv.org/abs/1810.04805 (Google) 7. "RoBERTa: A Robustly Optimized BERT Pretraining Approach" (2019) - https://arxiv.org/abs/1907.11692 (Facebook AI) 8. "ELMo: Deep contextualized word representations" (2018) - https://arxiv.org/abs/1802.05365 (Allen Institute for Artificial Intelligence) 9. "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context" (2019) - https://arxiv.org/abs/1901.02860 (Google AI Language) 10. "XLNet: Generalized Autoregressive Pretraining for Language Understanding" (2019) - https://arxiv.org/abs/1906.08237 (Google AI Language) 11. T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer" (2020) - https://arxiv.org/abs/1910.10683 (Google Research) 12. "Language Models are Few-Shot Learners" (2021) - https://arxiv.org/abs/2005.14165 (OpenAI)
theusus|3 years ago
unknown|3 years ago
[deleted]
databroker|3 years ago
[deleted]
Liberonostrud|3 years ago
[deleted]
zomglings|3 years ago
[deleted]
Waterluvian|3 years ago
The cat was having a lot of behavioural issues and ultimately he surrendered it to a shelter, where it may have been euthanized if nobody adopted it. (note that nothing I'm saying here is meant to condone or condemn the action)
The author editorialized it to fit the desired narrative, which is a thing that happens quite a lot. Gotta sell them books!
tayo42|3 years ago
Eaiser said then done still for sure. I have a couple hobbies where the top tier people are just annoying people in the rest of their life. Something about being really good at one thing seems to also correlate often with other insane personality traits
winrid|3 years ago
That's a bit more detail than just peeing on the sofa once.
unixhero|3 years ago
layer8|3 years ago
[deleted]
caxco93|3 years ago
siekmanj|3 years ago
nathias|3 years ago
Some of the highly influential papers in the field of AI that could have been on the list include "Generative Adversarial Networks" by Ian Goodfellow et al., "Attention is All You Need" by Vaswani et al., "AlexNet: ImageNet Classification with Deep Convolutional Neural Networks" by Alex Krizhevsky et al., "Playing Atari with Deep Reinforcement Learning" by Volodymyr Mnih et al., "Human-level control through deep reinforcement learning" by Volodymyr Mnih et al., "A Few Useful Things to Know About Machine Learning" by Pedro Domingos, among many others.
mritchie712|3 years ago
[deleted]
mgaunard|3 years ago