(no title)
2sk21
|
1 year ago
I'm surprised that the article doesn't mention that one of the key factors that enabled deep learning was the use of RELU as the activation function in the early 2010s. RELU behaves a lot better than the logistic sigmoid that we used until then.
sanxiyn|1 year ago
helltone|1 year ago
imjonse|1 year ago
HarHarVeryFunny|1 year ago
nets too small (not enough layers)
gradients not flowing (residual connections)
layer outputs not normalized
training algorithms and procedures not optimal (Adam, warm-up, etc)
cma|1 year ago
2sk21|1 year ago