top | item 47127817

(no title)

armcat | 6 days ago

That's a soft distinction (distilling vs learning). If I read a chapter in a text book I am distilling the knowledge from that chapter into my own latent space - one would hope I learn something. Flipping it the other way, you could say that model from Lab Y is ALSO learning the model from Lab X. Not just "distilling". Hence my original comment - how deep does this go?

discuss

EnPissant|6 days ago

And yet nearly every machine learning engineer would disagree with you, which is a given away that your argument is rooted in ideology.

armcat|6 days ago

> And yet nearly every machine learning engineer would disagree with you, which is a given away that your argument is rooted in ideology.

That's a bold statement! Of course I know the difference, in one case you are learning from correct/wrong answers, and in the other from a probability distribution. But in both cases you are using some X to move the weights. We can get down and gritty on KL divergence vs cross-entropy, but the whole topic is about "theft", which is perhaps in the eye of the beholder.