Ask HN: How to get back into AI?
266 points| quibono | 3 years ago
I am comfortable with autograd/computation graphs, PyTorch, "classic" neural nets and ones used for vision-type applications, as well as the basics of Transformer networks (I've trained a few smaller ones myself) and RNNs.
Do you know of any good resources to slowly get back into the loop?
So far I plan on reading through the original Diffusion/GPT papers and start going from there but I'd love to see what you think are some good sources. I would especially love to see some Jupyter notebooks to fiddle with as I find I learn best when I get to play around with the code.
Thank you
[+] [-] jdeaton|3 years ago|reply
You can view this approach in the same way that a beginner learns to program. The best way to learn is by attempting to implement (as much on your own as possible) something that solves a problem you're interested in. This has been my approach from the start (for both programming and ML), and is also what I would recommend for a beginner. I've found that continuing this practice, even while working on AI systems professionally, has been critical to maintaining a robust understanding of the evolving field of ML.
The key is finding a good method/paper that meets all of the following
0) is inherently very interesting to you
1) you don't already have a robust understanding of the method
2) isn't so far above your head that you can't begin to grasp it
3) doesn't require access to datasets/compute resources you don't have
of course, finding such a method isn't always easy and often takes some searching.
I want to contrast this with other types of approaches to learning AI with include
- downloading and running other people's ML code (in a jupyter notebook or otherwise)
- watching lecture series / talks giving overviews of AI methods
- reading (without putting into action) the latest ML papers
all of which I have found to be significantly less impactful on my learning.
[+] [-] tomp|3 years ago|reply
Most of the cutting edge papers are trained on several $100k worth of GPU time, so does it even make sense to implement the algorithm without the available data & compute? How can you be sure that your implementation is correct, if you can't train it (hence you can't run proper inference with a good model)?
Compare that to e.g. reimplementing a pure CS paper, almost anything can be reimplemented in a simple way - even something like "distributed database over 1000 nodes", well you don't technically need 1000 servers, you can just, you know, simulate them quite cheaply.
Of course there might be similar techniques for ML but I'm just not aware of them.
[+] [-] fullstackchris|3 years ago|reply
[+] [-] janalsncm|3 years ago|reply
[+] [-] Throwaway23459|3 years ago|reply
[+] [-] macrolime|3 years ago|reply
Some of the stuff I'm currently reading/watching or have recently
Practical Deep Learning, though it sounds like you may know this stuff already (https://course.fast.ai/)
Practical Deep learning part 2, more about diffusion models. Full course coming early next year (https://www.fast.ai/posts/part2-2022-preview.html)
Hugging Face course (https://huggingface.co/course/chapter1/1)
Diffusion models from hugging face https://huggingface.co/blog/annotated-diffusion https://huggingface.co/docs/diffusers/index
Andrej Karpathy's Neural Networks: Zero to Hero. He goes from the basics up to how GPT etc, so you can start wherever suits you (https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThs...)
3blue1brown's videos. I've found all his videos on neural networks and math worth watching, even for stuff that I already know, he sometimes has some new perspectives and nice animations.
brilliant.org. Nice math refresher and the courses there are almost like fun little games.
[+] [-] Railsify|3 years ago|reply
[+] [-] macrolime|3 years ago|reply
https://compneuro.neuromatch.io/
Recent research like "Relating transformers to models and neural representations of the hippocampal formation" might make it more relevant though (https://arxiv.org/abs/2112.04035v2)
quote from the abstract of that paper: "Many deep neural network architectures loosely based on brain networks have recently been shown to replicate neural firing patterns observed in the brain. One of the most exciting and promising novel architectures, the Transformer neural network, was developed without the brain in mind. In this work, we show that transformers, when equipped with recurrent position encodings, replicate the precisely tuned spatial representations of the hippocampal formation; most notably place and grid cells. Furthermore, we show that this result is no surprise since it is closely related to current hippocampal models from neuroscience. We additionally show the transformer version offers dramatic performance gains over the neuroscience version. This work continues to bind computations of artificial and brain networks, offers a novel understanding of the hippocampal-cortical interaction, and suggests how wider cortical areas may perform complex tasks beyond current neuroscience models such as language comprehension."
[+] [-] unknown|3 years ago|reply
[deleted]
[+] [-] quibono|3 years ago|reply
For some context, something I should have mentioned in the original post but failed to do: I was not intending to do a professional pivot to an AI role; it is more of a personal interest. I used to be really excited about this stuff and am looking forward to getting involved in it again just because I find it interesting.
Thank you, I really appreciate everyone's responses.
[+] [-] godelski|3 years ago|reply
I am an ML researcher working on generative modeling. I think you have enough experience that you'll catch up quickly. But the question is most what you're interested in? With that I can give better advice. Don't let anyone stop you from learning just to learn. Not everything has to be a career. A lot of us got here because it's fun.
I do think you'll pick up diffusion models quickly. I like the explicit density side of things more and density estimation. So I like Song's works and similar with Kingma. Also check out Lilian Wang's blogs. They are a wealth of material and sources. Can't go wrong there. You'll find that diffusion and VAEs are kinda similar. The difficulty you'll have in understanding something like stable diffusion is actually the programming (at least this was the hardest part for me).
Good luck and let me know if I can help.
[+] [-] maurits|3 years ago|reply
[+] [-] anonreeeeplor|3 years ago|reply
Quite honestly, the opportunity all seems to be on the front end. The idea that you are going to airdrop yourself as a hands on AI programmer into this market doesn’t make a huge amount of sense to me from a career perspective.
The opportunity is with the tools and how they are applied. Building front end experiences on ChatGPT and integrations and applied scenarios.
You actually doing the AI yourself means competing with PHDs and elite academics immersed in the field.
I think knowledge of AI is far less valuable than knowledge of the emerging landscape combined with a broad understanding of different tools and how they are applied.
The new trend here is very strongly Large Language Models (LLM). You should be far more specific with what your goal is and where to spend your time.
A lot of the “AI” you are referring to seems to be no longer relevent or interesting to the market.
If you are spending time with Jupiter notebooks I would say you are probably completely wasting your time and heading in the wrong direction.
LLM is the major trend. Focus entirely on that and the tools landscape and how to integrate it and apply it. It feels like you are navigating using an out of date map.
[+] [-] sullyj3|3 years ago|reply
[+] [-] synapticpaint|3 years ago|reply
Example 1: have a look through here: http://synapticpaint.com/dreambooth/browse/ for some examples of dreambooth models people have created
Example 2: you can merge different dreambooth models together to varying degrees of success (the idea being, you train model A on subject A, model B on subject B, and now you want to generate pictures of A and B together). My understanding is that this doesn't work too well at the moment, but it's possible that a different interpolation algorithm can yield better results.
I do agree with the general sentiment that you wouldn't necessarily be training your own models or creating your own architecture, just want to provide the perspective that understanding the AI side is valuable because it can lead to different capabilities and products.
[+] [-] etangent|3 years ago|reply
And this: "If you are spending time with Jupiter notebooks I would say you are probably completely wasting your time" And how do you suggests one performs data analysis on any problem that's not an LLM -- data analysis of any kind, such as "is the model I am trying to build a front-end for even works for my problem?"
[+] [-] quonn|3 years ago|reply
Also you‘re wrong. Look at what the OP wrote and then look at how the latest models are actually built and you would see at least 2/3 of their knowledge is relevant.
[+] [-] samvher|3 years ago|reply
My impression is that the field is more disciplined in terms of knowledge now than it was ~8 years ago - the fundamentals are better understood and more clearly expressed in literature.
Also there are still plenty of topics on which the new techniques can probably be fruitfully applied, especially if you have some domain knowledge that the math/CS PhDs don’t have.
For OP - I’m in a similar situation and have been going through Kevin Murphy’s “Probabilistic Machine Learning”, which is pretty massive and dense but also very lucid.
[+] [-] cweill|3 years ago|reply
Yes, LLMs are amazing but they won't be winning every single Kaggle competition, displacing every other ML algorithms in every setting.
[+] [-] joxel|3 years ago|reply
[+] [-] narrator|3 years ago|reply
Upgrading existing systems with AI is probably where it's at using existing models like stable diffusion, GPT-3 or some of the smaller downloadable language models if the task is very simple and the economics of using GPT-3 don't make sense.
[+] [-] mrbombastic|3 years ago|reply
[+] [-] steve_adams_86|3 years ago|reply
[+] [-] turkeygizzard|3 years ago|reply
I just read Francois Chollet's Deep Learning with Python and found it to be a fantastic high level overview of all the recent progress. There's some code, but not a lot. I mostly just appreciated it as a very straightforward plain-language treatment of RNNs, CNNs, and transformers.
Now I'm going through Stanford's CS224 lectures.
I'm sort of planning to read papers but as some other comments have pointed out, I'm less sure of the ROI on that since I'm not sure how feasible a future in AI is for me
[+] [-] axpy906|3 years ago|reply
[+] [-] curious-guy|3 years ago|reply
I am a research engineer/applied ai person building vision models in healthcare domain. I am currently preparing to transition to engineering roles like you did. For that, I am currently going through web dev - both frontend and backend. Would love to get some pointers from you on my approach and any recommendations from your side. Thanks!
[+] [-] ineptech|3 years ago|reply
Also there's no cost, no book to buy, no email signup, it's just a guy sharing knowledge like the old days. Great course.
[+] [-] jstx1|3 years ago|reply
It sounds like you're into it already. And you already know which new papers are interesting to you.
[+] [-] bobleeswagger|3 years ago|reply
As someone with 10 years of professional experience in software, I find every AI "trend" that has come up in that time to be incredibly odd. It is certainly remarkable what chatGPT, StableDiffusion, and other examples are doing today... Ultimately people are giving waaaaaaaay too much credit without understanding the technical details. These are pidgeon-holed examples that still aren't solving any real problems.
AI is still just statistics with marketing.
[+] [-] joenot443|3 years ago|reply
I’ve found that what ChatGPT comes up with is often a great start and has already saved me hours of time. Is it as good as paying a professional? Probably not. But I think it’s fair to say these models are already solving real world problems, even if they still need a bit of a helping hand. Just my thoughts, I’m not an expert on the models themselves.
[+] [-] spyder|3 years ago|reply
[+] [-] mdp2021|3 years ago|reply
Maybe it is and he's simply somebody with an interest for the topic and some progress to catch up with.
[+] [-] quickthrower2|3 years ago|reply
[+] [-] YetAnotherNick|3 years ago|reply
I would say AI is the opposite of statistics, for good or for bad.
[+] [-] sigmoid10|3 years ago|reply
[+] [-] jerpint|3 years ago|reply
[+] [-] mythhouse|3 years ago|reply
[+] [-] civilized|3 years ago|reply
[+] [-] grepfru_it|3 years ago|reply
[+] [-] CloudRecondite|3 years ago|reply
[+] [-] f0e4c2f7|3 years ago|reply
https://course.fast.ai
[+] [-] mathgladiator|3 years ago|reply
[+] [-] ttul|3 years ago|reply
When this course becomes public next year, I think it will be a great way to get caught up. In the meantime, you might still be able to pay the AU $500 fee and watch the course content, which was all recorded, if you are anxious to get going.
[+] [-] sjkoelle|3 years ago|reply
[+] [-] whiplash451|3 years ago|reply
You have two options:
1. Work full-time for companies doing this state-of-the-art stuff (OpenAI, Meta, etc.)
2. Work full-time for a (good) AI company that is doing interesting AI work, but most likely not based on GPT/SD/etc.
In both cases, you will learn a lot. Anything else seems like a costly and dangerous distraction to me.
[+] [-] stephc_int13|3 years ago|reply
Most medium and low-quality papers are full of errors and noise, but you can still learn from them.
Get your hands dirty with real code.
I would take a look at those:
https://github.com/geohot/tinygrad
https://github.com/ggerganov/whisper.cpp
[+] [-] Simon_O_Rourke|3 years ago|reply
[+] [-] marban|3 years ago|reply
[+] [-] optbuild|3 years ago|reply
Maybe I am a bit off the track. But how do someone reach this state?