I’m surprised by the argument. It’s not wrong. You need more data, but that presumes that the task is to pre-train on data. Additional compute is also useful for unearthing tacit capabilities in the models. This requires inference time scaling and post training usually on specific downstream tasks using RL. Sure that generates data, but it’s not the same as the Internet, and can be scaled.
No comments yet.