top | item 33955885

(no title)

You put the finger on exactly what I find incredible about the recent progress in ML - the reason I wrote this post was to see how much I could de-mystify these state-of-the-art models for myself, and the conclusion is that (after the model is trained) it all really boils down to a couple of matrix multiplications! All the impressive results we see, they're not coming from an extremely complicated system ('complicated' like a fighter jet is, with many different subsystems, which you'd need to read many books to memorize).

Of course, there's all the secret sauce to actually getting the models to learn anything, and all the empirical progress we make to make the training more efficient (ReLUs, etc). But how many of those are fundamental, vs. simply efficiency shortcuts? And: if you'd asked me 10 years ago what I thought it would take to get the kind of output these large models are getting these days, I would not have guessed anything nearly as simple as what those models actually are.

discuss

No comments yet.