(no title)
starspangled | 9 months ago
It's just semantics, but the point is that the article wasn't saying the same thing OP was worried about. There's nothing wrong with testing B, B', B'', etc. until you find a significant performance improvement. You just wouldn't test B several times and take the last set of data when it looks good. Almost goes without saying really.
LegionMammal978|9 months ago
There is, in fact, "something wrong" with this, which is what GP was pointing out. It's literally covered under "Playing with multiple comparisons" in TFA.
(Personally, to combat this, I've ignored the fancy p-values and resorted to the eyeball test of whether it very consistently produces a noticable speedup.)