top | item 33932594

Ask HN: How to get back into AI?

266 points| quibono | 3 years ago

I was involved in machine learning and AI a few years ago, mainly before the onset of the new diffusion models, large transformers (GPT*), Graph NNs and Neural ODE stuff.

I am comfortable with autograd/computation graphs, PyTorch, "classic" neural nets and ones used for vision-type applications, as well as the basics of Transformer networks (I've trained a few smaller ones myself) and RNNs.

Do you know of any good resources to slowly get back into the loop?

So far I plan on reading through the original Diffusion/GPT papers and start going from there but I'd love to see what you think are some good sources. I would especially love to see some Jupyter notebooks to fiddle with as I find I learn best when I get to play around with the code.

Thank you

140 comments

order
[+] jdeaton|3 years ago|reply
I am an ML researcher working in the industry: by far the most effective way to maintain/advance my understanding of ML methods is implement the core of an interesting paper and reproduce (some) of their results. Completing a working implementation really forces your understanding to be on another level than if you just read the paper and think "I get it". It can be easy to read (for example) a diffusion/neural ode paper and come away thinking that you "get it" while still having a wildly inadequate understanding of how to actually get it to work yourself.

You can view this approach in the same way that a beginner learns to program. The best way to learn is by attempting to implement (as much on your own as possible) something that solves a problem you're interested in. This has been my approach from the start (for both programming and ML), and is also what I would recommend for a beginner. I've found that continuing this practice, even while working on AI systems professionally, has been critical to maintaining a robust understanding of the evolving field of ML.

The key is finding a good method/paper that meets all of the following

0) is inherently very interesting to you

1) you don't already have a robust understanding of the method

2) isn't so far above your head that you can't begin to grasp it

3) doesn't require access to datasets/compute resources you don't have

of course, finding such a method isn't always easy and often takes some searching.

I want to contrast this with other types of approaches to learning AI with include

- downloading and running other people's ML code (in a jupyter notebook or otherwise)

- watching lecture series / talks giving overviews of AI methods

- reading (without putting into action) the latest ML papers

all of which I have found to be significantly less impactful on my learning.

[+] tomp|3 years ago|reply
Sorry if this is a stupid question, but from a non-practitioner's perspective, how or why is this sensible?

Most of the cutting edge papers are trained on several $100k worth of GPU time, so does it even make sense to implement the algorithm without the available data & compute? How can you be sure that your implementation is correct, if you can't train it (hence you can't run proper inference with a good model)?

Compare that to e.g. reimplementing a pure CS paper, almost anything can be reimplemented in a simple way - even something like "distributed database over 1000 nodes", well you don't technically need 1000 servers, you can just, you know, simulate them quite cheaply.

Of course there might be similar techniques for ML but I'm just not aware of them.

[+] fullstackchris|3 years ago|reply
+1 on implementing papers, that's one of the best you can do to improve your skills (anywhere in science or engineering actually). A warning: I remember trying to do this back in my uni / grad days and more often than not there is key information or things (perhaps even by accident) left out of the implementation descriptions. I was more in mechanical engineering so perhaps this is less common in AI oriented papers, but I still think it's a valid thing to look out for.
[+] janalsncm|3 years ago|reply
Can you recommend some papers which fit those criteria?
[+] Throwaway23459|3 years ago|reply
Most recent papers, in NLP at least, are so sparse on detail that it is impossible to reproduce their models. And then there's the compute cost, as at least one other poster has mentioned.
[+] macrolime|3 years ago|reply
I'm in the same boat kinda, but even more outdated on some parts. I had some AI specialization back in college, but that was before deep learning was even a thing, so we did stuff like self-organizing maps and evolutionary algorithms, but it wasn't really all that useful for much back then. Been following deep learning from the sidelines, but for work my AI work has been restricted to GOFAI until recently.

Some of the stuff I'm currently reading/watching or have recently

Practical Deep Learning, though it sounds like you may know this stuff already (https://course.fast.ai/)

Practical Deep learning part 2, more about diffusion models. Full course coming early next year (https://www.fast.ai/posts/part2-2022-preview.html)

Hugging Face course (https://huggingface.co/course/chapter1/1)

Diffusion models from hugging face https://huggingface.co/blog/annotated-diffusion https://huggingface.co/docs/diffusers/index

Andrej Karpathy's Neural Networks: Zero to Hero. He goes from the basics up to how GPT etc, so you can start wherever suits you (https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThs...)

3blue1brown's videos. I've found all his videos on neural networks and math worth watching, even for stuff that I already know, he sometimes has some new perspectives and nice animations.

brilliant.org. Nice math refresher and the courses there are almost like fun little games.

[+] Railsify|3 years ago|reply
Have you had a look at https://nnfs.io/ ? I bought the book and am gearing up to start working through it, I would be interested to know your thoughts. Generally I want to chart a personal curriculum from data engineer to practical application of modern AI to real business problems.
[+] macrolime|3 years ago|reply
The neuromatch computational neuroscience course also seems quite interesting, though maybe less of practical use.

https://compneuro.neuromatch.io/

Recent research like "Relating transformers to models and neural representations of the hippocampal formation" might make it more relevant though (https://arxiv.org/abs/2112.04035v2)

quote from the abstract of that paper: "Many deep neural network architectures loosely based on brain networks have recently been shown to replicate neural firing patterns observed in the brain. One of the most exciting and promising novel architectures, the Transformer neural network, was developed without the brain in mind. In this work, we show that transformers, when equipped with recurrent position encodings, replicate the precisely tuned spatial representations of the hippocampal formation; most notably place and grid cells. Furthermore, we show that this result is no surprise since it is closely related to current hippocampal models from neuroscience. We additionally show the transformer version offers dramatic performance gains over the neuroscience version. This work continues to bind computations of artificial and brain networks, offers a novel understanding of the hippocampal-cortical interaction, and suggests how wider cortical areas may perform complex tasks beyond current neuroscience models such as language comprehension."

[+] quibono|3 years ago|reply
OP here.

For some context, something I should have mentioned in the original post but failed to do: I was not intending to do a professional pivot to an AI role; it is more of a personal interest. I used to be really excited about this stuff and am looking forward to getting involved in it again just because I find it interesting.

Thank you, I really appreciate everyone's responses.

[+] godelski|3 years ago|reply
I'm not sure why people are telling you professional stuff. You do you.

I am an ML researcher working on generative modeling. I think you have enough experience that you'll catch up quickly. But the question is most what you're interested in? With that I can give better advice. Don't let anyone stop you from learning just to learn. Not everything has to be a career. A lot of us got here because it's fun.

I do think you'll pick up diffusion models quickly. I like the explicit density side of things more and density estimation. So I like Song's works and similar with Kingma. Also check out Lilian Wang's blogs. They are a wealth of material and sources. Can't go wrong there. You'll find that diffusion and VAEs are kinda similar. The difficulty you'll have in understanding something like stable diffusion is actually the programming (at least this was the hardest part for me).

Good luck and let me know if I can help.

[+] maurits|3 years ago|reply
Im an ML researcher (reinforcement learning). I learned by implementing papers from scratch endlessly. Any specific subfield you are interested in?
[+] anonreeeeplor|3 years ago|reply
I think you should be careful about dropping whatever you are doing and running back to this new iteration of AI.

Quite honestly, the opportunity all seems to be on the front end. The idea that you are going to airdrop yourself as a hands on AI programmer into this market doesn’t make a huge amount of sense to me from a career perspective.

The opportunity is with the tools and how they are applied. Building front end experiences on ChatGPT and integrations and applied scenarios.

You actually doing the AI yourself means competing with PHDs and elite academics immersed in the field.

I think knowledge of AI is far less valuable than knowledge of the emerging landscape combined with a broad understanding of different tools and how they are applied.

The new trend here is very strongly Large Language Models (LLM). You should be far more specific with what your goal is and where to spend your time.

A lot of the “AI” you are referring to seems to be no longer relevent or interesting to the market.

If you are spending time with Jupiter notebooks I would say you are probably completely wasting your time and heading in the wrong direction.

LLM is the major trend. Focus entirely on that and the tools landscape and how to integrate it and apply it. It feels like you are navigating using an out of date map.

[+] sullyj3|3 years ago|reply
I find the implicit assumption a bit funny that the only reason OP might be asking this is for career reasons rather than say, curiosity, the joy of learning, love of knowledge for its own sake.
[+] synapticpaint|3 years ago|reply
I wouldn't say that "the opportunity all seems to be on the front end". Specifically for stable diffusion, there are a lot of different ways to use the model. I think we're just starting to scratch the surface of what SD can do, so there is some value in tinkering with different ways to use and apply the model.

Example 1: have a look through here: http://synapticpaint.com/dreambooth/browse/ for some examples of dreambooth models people have created

Example 2: you can merge different dreambooth models together to varying degrees of success (the idea being, you train model A on subject A, model B on subject B, and now you want to generate pictures of A and B together). My understanding is that this doesn't work too well at the moment, but it's possible that a different interpolation algorithm can yield better results.

I do agree with the general sentiment that you wouldn't necessarily be training your own models or creating your own architecture, just want to provide the perspective that understanding the AI side is valuable because it can lead to different capabilities and products.

[+] etangent|3 years ago|reply
This comment does not quite make sense: "The new trend here is very strongly Large Language Models (LLM)." Is every problem a language problem? No, of course. Is every problem going to be solved by an LLM? What about problems that require unique data sources that no LLM will ever be trained on?

And this: "If you are spending time with Jupiter notebooks I would say you are probably completely wasting your time" And how do you suggests one performs data analysis on any problem that's not an LLM -- data analysis of any kind, such as "is the model I am trying to build a front-end for even works for my problem?"

[+] quonn|3 years ago|reply
I don‘t like how this comments sounds …

Also you‘re wrong. Look at what the OP wrote and then look at how the latest models are actually built and you would see at least 2/3 of their knowledge is relevant.

[+] samvher|3 years ago|reply
There is a lot of transferable knowledge to gain from learning this stuff properly, even if you don’t expect to do core AI work in a commercial setting. Optimization, function fitting, probability/statistics, GPU programming…

My impression is that the field is more disciplined in terms of knowledge now than it was ~8 years ago - the fundamentals are better understood and more clearly expressed in literature.

Also there are still plenty of topics on which the new techniques can probably be fruitfully applied, especially if you have some domain knowledge that the math/CS PhDs don’t have.

For OP - I’m in a similar situation and have been going through Kevin Murphy’s “Probabilistic Machine Learning”, which is pretty massive and dense but also very lucid.

[+] cweill|3 years ago|reply
I disagree with this comment, and anyone reading it should take it with a big grain of salt. Let's go back to 2016 and replace "LLM" with "Reinforcement Learning". Everyone thought every problem could be solve by RL because it's a looser restriction on the problem space. But then RL failed to deliver real world benefits beyond some very specific circumstances (well defined games), and supervised learning is/was still king for 99% of problems.

Yes, LLMs are amazing but they won't be winning every single Kaggle competition, displacing every other ML algorithms in every setting.

[+] joxel|3 years ago|reply
Applied AI/ML is still a great career field, especially if you have a knowledge of physics.
[+] narrator|3 years ago|reply
One thing I'd like to add is that you do not have the computing power to generate a large language or vision model. Period. Unless you have hundreds of thousands of dollars for compute time you are just not going to do anything interesting with model building and AI.

Upgrading existing systems with AI is probably where it's at using existing models like stable diffusion, GPT-3 or some of the smaller downloadable language models if the task is very simple and the economics of using GPT-3 don't make sense.

[+] mrbombastic|3 years ago|reply
Not OP but thanks for this response, as someone on the front end with a passing interest in AI this helped me recalibrate my thinking on this.
[+] steve_adams_86|3 years ago|reply
Is LLM even applicable to many things though? If we want more nuanced and contextual vision, does LLM help?
[+] turkeygizzard|3 years ago|reply
Similar situation as you. I stopped keeping up around 2015ish.

I just read Francois Chollet's Deep Learning with Python and found it to be a fantastic high level overview of all the recent progress. There's some code, but not a lot. I mostly just appreciated it as a very straightforward plain-language treatment of RNNs, CNNs, and transformers.

Now I'm going through Stanford's CS224 lectures.

I'm sort of planning to read papers but as some other comments have pointed out, I'm less sure of the ROI on that since I'm not sure how feasible a future in AI is for me

[+] axpy906|3 years ago|reply
Following. I got off the train about two years ago to work more in engineering. The way I see it, if you’re not a research scientist - this field is best addressed as an ML engineer as there are more challenges in systems. Would love to be proved wrong.
[+] curious-guy|3 years ago|reply
Hey,

I am a research engineer/applied ai person building vision models in healthcare domain. I am currently preparing to transition to engineering roles like you did. For that, I am currently going through web dev - both frontend and backend. Would love to get some pointers from you on my approach and any recommendations from your side. Thanks!

[+] ineptech|3 years ago|reply
Possibly too basic for you, but I was hugely impressed with https://www.nlpdemystified.org/course (found on HN a week or two back). Each chapter has a large jupyter notebook with lots of annotated sample code.

Also there's no cost, no book to buy, no email signup, it's just a guy sharing knowledge like the old days. Great course.

[+] jstx1|3 years ago|reply
> I am comfortable with autograd/computation graphs, PyTorch, "classic" neural nets and ones used for vision-type applications, as well as the basics of Transformer networks (I've trained a few smaller ones myself) and RNNs.

It sounds like you're into it already. And you already know which new papers are interesting to you.

[+] bobleeswagger|3 years ago|reply
Maybe it's just me, but it sounds like you're only interested in getting back into AI because it is currently in the limelight. "Working in AI" a few years ago doesn't mean much if you have nothing to show for it. What I'm getting at, is your motivations don't seem genuine; there's nothing that tells me you care about the technology advancing more than a paycheck.

As someone with 10 years of professional experience in software, I find every AI "trend" that has come up in that time to be incredibly odd. It is certainly remarkable what chatGPT, StableDiffusion, and other examples are doing today... Ultimately people are giving waaaaaaaay too much credit without understanding the technical details. These are pidgeon-holed examples that still aren't solving any real problems.

AI is still just statistics with marketing.

[+] joenot443|3 years ago|reply
On the side, I make cheap Wordpress sites for small businesses (mostly as a favor to folks I know personally) and always had a hard time writing good marketing copy.

I’ve found that what ChatGPT comes up with is often a great start and has already saved me hours of time. Is it as good as paying a professional? Probably not. But I think it’s fair to say these models are already solving real world problems, even if they still need a bit of a helping hand. Just my thoughts, I’m not an expert on the models themselves.

[+] spyder|3 years ago|reply
Huh, WTF? What's wrong in getting interested in something because there is new progress in it? And how can you assume he is only interested in it because of the money when he only asked about how to learn, play around with code and didn't ask anything about finding a job or projects to work on?
[+] mdp2021|3 years ago|reply
> Maybe it's just me

Maybe it is and he's simply somebody with an interest for the topic and some progress to catch up with.

[+] quickthrower2|3 years ago|reply
Augmenting writers is one (ab)use of GPT. And getting bespoke stack overflow copy-paste source-code (for better or worse) too.
[+] YetAnotherNick|3 years ago|reply
> AI is still just statistics with marketing.

I would say AI is the opposite of statistics, for good or for bad.

[+] sigmoid10|3 years ago|reply
To be honest, for transformers just go to huggingface.co and see what interests you. They have tons of examples to run and they also link to all the papers in the documentation. It doesn't get much easier to get into it. Even for the more recent stuff like vision transformers and diffusion models.
[+] jerpint|3 years ago|reply
The “hard” part is taking an example and adapting it to new tasks. This is the best way to learn imo
[+] mythhouse|3 years ago|reply
How are you software engineering skills. Thats the biggest gap i currently see at my current employer. Way too many data scientists not able to make an impact because they are not able to put their notebooks into a product and run it in prodction.
[+] civilized|3 years ago|reply
I partly agree with this but I think it's a bit overrated. There's no massive barrier to getting an actually good model into production. If the model seems promising but no one can figure out how to productionize it, it probably has fatal flaws as a model. There's no guarantee that a mess of Python code that somehow produces a nice AUC curve is actually doing anything valuable. As in science in general, there are many ways to fool yourself in data science.
[+] grepfru_it|3 years ago|reply
I see this too. So many well intended projects that were scrapped because management didn't see a business case from jupyter notebook results...
[+] CloudRecondite|3 years ago|reply
What is involved in doing this and why isn’t there a service that exists to make it easy? Is it all very bespoke?
[+] mathgladiator|3 years ago|reply
I'm curious what you discover as I did some AI decades ago, and now I have a new AI problem. I'm trying to research how to build a generic self learning board game agent for my platform ( https://www.adama-platform.com/ ) as I've reduced the game flow to a decision problem. How I intend to start is to experiment with simple stuff at a low level, and then use that experience to find out what to buy.
[+] ttul|3 years ago|reply
I signed up for Jeremy Howard’s second AI course at the University of Queensland. The course lectures were streamed live to participants around the world in October and November. An online forum organizes everything.

When this course becomes public next year, I think it will be a great way to get caught up. In the meantime, you might still be able to pay the AU $500 fee and watch the course content, which was all recorded, if you are anxious to get going.

[+] sjkoelle|3 years ago|reply
Personally I think that contributing to an open source community is the move. Join the Eleuther Discord. Futz around on Hugging Face. Play with notebooks on Uberduck. Have fun!!! Gatekeeping is dumb.
[+] whiplash451|3 years ago|reply
The latest development in AI are very cool, but a big lure in my opinion.

You have two options:

1. Work full-time for companies doing this state-of-the-art stuff (OpenAI, Meta, etc.)

2. Work full-time for a (good) AI company that is doing interesting AI work, but most likely not based on GPT/SD/etc.

In both cases, you will learn a lot. Anything else seems like a costly and dangerous distraction to me.

[+] stephc_int13|3 years ago|reply
Read all the leading papers, many times, to get a deep understanding, the writing quality is usually pretty low, but the information density can be very high, you'll probably miss the important details the first time.

Most medium and low-quality papers are full of errors and noise, but you can still learn from them.

Get your hands dirty with real code.

I would take a look at those:

https://github.com/geohot/tinygrad

https://github.com/ggerganov/whisper.cpp

[+] Simon_O_Rourke|3 years ago|reply
If you're something of a snakeoil salesman with some, but not deep, technical knowledge, then AI Ethics is for you. There are (or were) companies out there who would pay big bucks for folks to tell them some models potentially could be biased or otherwise discriminatory because of poorly selected data. Wash, rinse, repeat and see your paychecks roll in.
[+] marban|3 years ago|reply
Related question, what's the current state of paraphrasing text with off-the-shelf Python libs — PyTorch/Transformers?
[+] optbuild|3 years ago|reply
> I am comfortable with autograd/computation graphs, PyTorch, "classic" neural nets and ones used for vision-type applications, as well as the basics of Transformer networks (I've trained a few smaller ones myself) and RNNs.

Maybe I am a bit off the track. But how do someone reach this state?