top | item 42999811

(no title)

AlphaGo seems more like an automated process to me because you can start from nothing except the algorithm and the rules. Since a Go game only has 2 outcomes most of the time, and the model can play with itself, it is guaranteed to learn something during self-play.

In the LLM case you have to have an already capable model to do RL. Also I feel like the problem selection part is important to make sure it's not too hard. So there's still much labor involved.

discuss

fenomas|1 year ago

Yes, IIUC those points are correct - you need relatively capable models, and well-crafted questions. The comparison with AlphaGo is that the processes are analogous, not identical - the key point being that in both cases the model is choosing its own path towards a goal, not just imitating the path that a human labeler took.