top | item 33563993

(no title)

jaschasd | 3 years ago

(the plot shows extreme overfitting with a 10 parameter model, and interpolation with a 10,000 parameter model)

discuss

cs702|3 years ago

Interpolation == extreme overfitting.

Double descent phenomenon is what happens after interpolation.

RESPONDING TO YOUR LAST COMMENT (after reaching thread depth limit):

Think of it this way: Why and how does the model's performance continue to improve on previously unseen samples after the model has fully overfit (interpolated between) all training samples? Interpolation is not the end-point in training, but a temporary threshold after which models learn to generalize better, improving on interpolation. How is it that these models improve on interpolation?

jaschasd|3 years ago

I can't reply directly -- is there a maximum thread depth, or a maximum conversation depth?

Anyway -- I wanted to apologize for misreading -- I missed the parenthetical "interpolation" in your comment. I think we are both interpreting the plot the same way.

In terms of your comment about anecdotal evidence -- are you talking about the case where data and model size are increased jointly? If so, I agree, though I don't think that is any longer cleanly to do with double descent/overparameterization.