Hi everyone, I wrote the article. I do consider this overfitting because we are training on these frames way more time than would be normally advised for the size of the training set such that the error is essentially zero for these frames. The model performs well in "out-of-sample" here but only out of sample that is semantically close to the original training set. Besides, overfitting is defined procedurally, not by how well it performs. You could have an overfit model that just happens to perform well on some stuff it was not trained on, that doesn't change the fact that the model was overfit.
thaumasiotes|4 years ago
Huh? That's the opposite of the truth.
Compare https://en.wikipedia.org/wiki/Overfitting :
> In statistics, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit additional data or predict future observations reliably".
> The essence of overfitting is to have unknowingly extracted some of the residual variation (i.e. the noise) as if that variation represented underlying model structure.
Procedural concerns are not part of the concept. Conceptually, overfitting means including information in your model that isn't relevant to the prediction you're making, but that is helpful, by coincidence, in the data you're fitting the model to.
But since that can't be measured, instead, you measure overfitting through performance.
elandau25|4 years ago
An overfitted model is a statistical model that contains more parameters than can be justified by the data.
Another definition from https://www.ibm.com/cloud/learn/overfitting#:~:text=Overfitt...:
Overfitting is a concept in data science, which occurs when a statistical model fits exactly against its training data.
We can argue over the precise definition of overfitting, but when you fitting a high-capacity model exactly to the training data, that is a procedural question and I would argue falls under the overfitting umbrella.
ALittleLight|4 years ago
Is that happening when you train these micromodels? If not, I have a hard time seeing how it's overfitting because the model is still performing well for the data you train it on and use it on. If that is happening, then I don't see the benefit of it. A model that wasn't overfit would just do better at the task of collecting additional training data.
I think the approach you're talking about makes sense - create a simple model rapidly and leverage it to get more training data which you can then use to refine the model to be better still. I just don't think the term "overfitting" describes that process well - unless I'm misunderstanding something.
elanning|4 years ago
elandau25|4 years ago
My argument is that compared to models, as most people use them, micro-models are low bias and high variance, and thus overfit. That's why I set a distinction between a batman model and a batman micro-model.
MillenialMan|4 years ago
Couldn't you use this logic to say that AlphaGo is overfit because it can only play Go, not chess?