To me intellect has two parts to it: "creativity" and "correctness". And from this perspective random sampler is infinitely "creative" - over (infinite) time it can come up with answer to any given problem. And from this perspective it does feel natural that base models are more "creative" (because that's what being measured in the paper), while RL models are more "correct" (that's a slope of the curve from the paper).
No comments yet.