(no title)
mlthoughts2018 | 4 years ago
Neural nets typically don’t benefit much from it because you can use batch normalization, dropout and clever activation functions to achieve the same results, by having the network learn diminished sensitivity to outliers that produce neurons which saturate the low end of an activation function.
This is preferable because many of the robust potential functions involve absolute values, order statistics and other non-differentiable quantities that are hard to put into backpropagation-based optimizers. You almost always would need to relax the loss function to something that trades off smoothness against outlier robustness, where convergence will be slower and slower as you crank the trade off closer to outlier robustness.
No comments yet.