top | item 38837066

(no title)

Yes, I have a very good point in fact. But the above comment purposely chooses not to argue with it, because it's easier to ignore it entirely and argue something else.

discuss

p1esk|2 years ago

The problem is you presented something as a fact while it’s just a guess. Some people guess it’s an improved gradient flow, others guess it’s a smoother loss surface, someone else guesses it’s a shortcut for early layer information to reach later layers, etc. We don’t actually know why resnets work so well.

HighFreqAsuka|2 years ago

The point of that comment doesn't have anything to do with how ResNets actually work. You missed the actual point.

> We don’t actually know why resnets work so well.

Yes actually we do. We know, from the literature, that very deep neural networks suffered from vanishing gradients in their early layers in the same way traditional RNNs did. We know that was the motivation for introducing skip connections which gives us a hypothesis we can test. We can measure, using the test I described, the differences in the size of gradients in the early layers with and without skip connections. We can do this across many different problems for additional statistical power. We can analyze the linear case and see that the repeated matmults should lead to small gradients if their singular values are small. To ignore all of this and say that well we don't have a general proof that satisfies a mathematician so i guess we just don't know is silly.