top | item 38642659

(no title)

IKantRead | 2 years ago

> don’t waste your time writing your own neural net and backprop.

I don't think you should be combining writing a neural network with doing backprop since I don't know anyone working with serious ML who is not using some sort of automatic differentiation library to handling the backprop part for them. I'm not entirely sure people even know what they're saying when they talk about backprop these days, and I suspect they're confusing it with gradient optimization.

But anyone seriously interested in ML absolutely should be building their own models from scratch and training them with gradient descent, ideally start with building out your own optimization routine rather than using a prepackaged one.

This is hugely important since the optimization part of the learning is really the heart of modern machine learning. If you really want to understand ML you should have a strong intuition about various methods of optimizing a given model. Additionally there are lots of details and tricks behind these models that are ignored if you're only calling an api around these models.

There's a world of difference between implementing an LSTM and calling one. You learn significantly more about what's actually happening by doing the former.

discuss

order

janalsncm|2 years ago

> the optimization part of the learning is really the heart of modern machine learning

It’s an important component but I wouldn’t say it’s the main factor. ML is ultimately about your data, so understanding it is critical. Feature selection and engineering, sampling, subspace optimization (e.g. ESMMs) and interpreting the results correctly are really the main places you can squeeze the most juice out. Optimizing the function is the very last step.

Basically, you can go ahead and optimize down to the very bottom of the global min but a model with better features and better feature interactions is going to win.

Further, there are a ton of different optimizers available. SGD, Adam, Adagrad, RMSProp, FTRL, etc. With just one hour a day, you could spend six months simply writing and understanding the most popular ones.