Double descent phenomenon is what happens after interpolation.
--
RESPONDING TO YOUR LAST COMMENT (after reaching thread depth limit):
Think of it this way: Why and how does the model's performance continue to improve on previously unseen samples after the model has fully overfit (interpolated between) all training samples? Interpolation is not the end-point in training, but a temporary threshold after which models learn to generalize better, improving on interpolation. How is it that these models improve on interpolation?
I can't reply directly -- is there a maximum thread depth, or a maximum conversation depth?
Anyway -- I wanted to apologize for misreading -- I missed the parenthetical "interpolation" in your comment. I think we are both interpreting the plot the same way.
In terms of your comment about anecdotal evidence -- are you talking about the case where data and model size are increased jointly? If so, I agree, though I don't think that is any longer cleanly to do with double descent/overparameterization.
cs702|3 years ago
Double descent phenomenon is what happens after interpolation.
--
RESPONDING TO YOUR LAST COMMENT (after reaching thread depth limit):
Think of it this way: Why and how does the model's performance continue to improve on previously unseen samples after the model has fully overfit (interpolated between) all training samples? Interpolation is not the end-point in training, but a temporary threshold after which models learn to generalize better, improving on interpolation. How is it that these models improve on interpolation?
jaschasd|3 years ago
Anyway -- I wanted to apologize for misreading -- I missed the parenthetical "interpolation" in your comment. I think we are both interpreting the plot the same way.
In terms of your comment about anecdotal evidence -- are you talking about the case where data and model size are increased jointly? If so, I agree, though I don't think that is any longer cleanly to do with double descent/overparameterization.