I agree with your observation about the exact noise-free nature of the problem. It allows them to formulate the problem as "minimize complexity such that you memorize the X-y relationship exactly". This would need to be generalized to the noisy case: instead of demanding exact memorization, you'd need to prescribe an error budget. But then this error budget seems like an effective complexity metaparameter, doesn't it, and we're back to square zero of cross-validation.
ealexhudson|1 year ago
But because this is clean data, I wonder if there's basically a big gap here: the codec that encodes the "correct rule" can achieve a step-change lower bandwidth requirement than similar-looking solutions. The most elegant ruleset - at least in this set of puzzles - always compresses markedly better. And so you can kind of brute-force the correct rule by trying lots of encoding strategies, and just identify which one gets you that step-change compression benefit.