Hard disagree. Your link relies on gradient descent as an explanation, whereas OP explains why optimization is not needed to understand DL generalization. PAC-Bayes, and the other different countable hypothesis bounds in OP also are quite divergent from VC dimension. The whole point of OP seems to be that these other frameworks, unlike VC dimension, can explain generalization with an arbitrarily flexible hypothesis space.
cgdl|11 months ago
But once you are ready to do that then algorithmic stability is enough. You don't then need to think about Bayesian ensembles, or other proxies/simplifications etc. but can focus on just the specific learning setup you have. BTW algorithmic stability is not a new idea. An early version showed up within a few years of VC theory in the 80s in order to understand why nearest neighbors generalizes (it wasn't called algorithmic stability then though).
If you are interested in this, also recommend [3].
[2] https://arxiv.org/abs/1611.03530
[3] https://arxiv.org/abs/1902.04742
singulargalaxy|11 months ago
"and that already requires studying the specific optimization algorithm in order to understand why it picks certain hypothesis over others in the space." But the OP paper explains how even "guess and check" can generalize similarly to SGD. It's becoming more well understood that the role of the optimizer may have been historically overstated for understanding DL generalization. It seems to be more about loss landscapes.
Don't get me wrong, these references you're linking are super interesting. But they don't take away from the OP paper which is adding something quite valuable to the discussion.