(no title)
quantumspandex | 1 year ago
Also how can the training of LLMs be parallelized when updating parameters are sequential? Sure we can train on several samples simultaneously, but the parameter updates are with respect to the first step.
quantumspandex | 1 year ago
Also how can the training of LLMs be parallelized when updating parameters are sequential? Sure we can train on several samples simultaneously, but the parameter updates are with respect to the first step.
fenomas|1 year ago
(Hence the analogy to training AlphaGo, wherein you take a model that sometimes wins games, and then play a bunch of games while reinforcing the cases where it won, so that it evolves its own ways of winning more often.)
quantumspandex|1 year ago
In the LLM case you have to have an already capable model to do RL. Also I feel like the problem selection part is important to make sure it's not too hard. So there's still much labor involved.
mtkd|1 year ago
https://medium.com/@sahin.samia/the-math-behind-deepseek-a-d...
quantumspandex|1 year ago
epr|1 year ago
quantumspandex|1 year ago