top | item 41780724

(no title)

dongecko | 1 year ago

Bolzman machines were there in the very early days of deep learning. It was a clever hack to train deep nets layer wise and work with limited ressources.

Each layer was trained similar to the encoder part of an autoencoder. This way the layerwise transformations were not random, but roughly kept some of the original datas properties. Up to here training was done without the use of labelled data. After this training stage was done, you had a very nice initialization for your network and could train it end to end according to your task and target label.

If I recall correctly, the neural layers output was probabilistic. Because of that you couldn't simply use back propagation to learn the weights. Maybe this is the connection to John Hopkins work. But here my memory is a bit fuzzy.

discuss

etiam|1 year ago

Boltzmann machines were there in the 1980s, and they were created on the basis of Hopfield nets (augmenting with statistical physics techniques, among other reasons to better navigate the energy landscape without getting stuck in local optima so much).

From the people dissing the award here it seems like even a particularly benign internet community like HN has little notion of ML with ANN:s before Silicon Valley bought in for big money circa 2012. And media reporting from then on hasn't exactly helped.

ANN:s go back a good deal further still (as the updated post does point out) but the works cited for this award really are foundational for the modern form in a lot of ways.

As for DL and backpropagation: Maybe things could have been otherwise, but in the reality we actually got, optimizing deep networks with backpropagation alone never got off the ground on it's own. Around 2006 Hinton started getting it to work by building up layer-wise with optimizing Restricted Boltzmann Machines (the lateral connections within a layer are eliminated from the full Boltzmann Machine), resulting in what was termed a Deep Belief Net, which basically did it's job already but could then be fine-tuned with backprop for performance, once it had been initialized with the stack of RBM:s. An alternative approach with layer-wise autoencoders (also a technique essentially created by Hinton) soon followed.

Once these approaches had shown that deep ANN:s could work though, the analysis showed pretty soon that the random weight initializations used back then (especially when combined with the historically popular sigmoid activation function) resulted in very poor scaling of the gradients for deep nets which all but eliminated the flow of feedback. It might have generally optimized eventually, but after way longer wait than was feasible when run on the computers back then. Once the problem was understood, people made tweaks to the weight initialization, activation function and otherwise the optimization, and then in many cases it did work going directly to optimizing with supervised backprop. I'm sure those tweaks are usually taken for granted to the point of being forgotten today, when one's favourite highly-optimized dedicated Deep Learning library will silently apply the basic ones without so much as being requested to, but take away the normalizations and the Glorot or whatever initialization and it could easily mean a trip back to rough times getting your train-from-scratch deep ANN to start showing results.

I didn't expect this award, but I think it's great to see Hinton recognized again, and precisely because almost all modern coverage is to lazy to track down earlier history than the 2010s, not least Hopfield's foundational contribution, I think it is all the more important that the Nobel foundation did.

So going back to the original question above: there are so many bad, confused versions of neural network history going around that whether or not this one is widely accepted isn't a good measure of quality. For what it's worth, to me it seems a good deal more complete and veridical than most encountered today.