(no title)
shawntan | 4 months ago
You can still game those benchmarks (tune your hyperparameters after looking at test results), but that setting measures for generalisation on the test set _given_ the training set specified. Using any additional data should be going against the benchmark rules, and should not be compared on the same lines.
YeGoblynQueenne|4 months ago
And this is standard practice, like everyone does it all the time and I believe a sizeable majority of researchers don't even understand that what they do is pointless because that's what they've been taught to do, by looking at each other's work and from what their supervisors tell them to do etc.
Btw, we don't really care about generalisation on the test set, per se. The point of testing on a held-out test set is that it's supposed to give you an estimate of a model's generalisation on truly unseen data, i.e. data that was not available to the researchers during training. That's the generalisation we're really interested in. And the reason we're interested in that is that if we deploy a model in a real-world situation (rare as that may be) it will have to deal with unseen data, not with the training data, nor with the test data.
unknown|4 months ago
[deleted]
unknown|4 months ago
[deleted]