(no title)
eachro
|
11 months ago
During the openai gym era of RL, one of the great selling pts was that RL was very approachable for a new comer as the gym environments were small and tractable that a hobbyist could learn a little bit of RL, try it out on cartpole and see how it'd perform. Are there similarly tractable RL tasks/learning environments with LLMs? From the outside, my impression is that you need some insane GPU access to even start to mess around with these models. Is there something one can do on a normal MacBook air for instance in this LLM x RL domain?
al_th|11 months ago
I'm absolutely not versed in RL, but I wanted to understand GRPO, the RL algorithm behind Deepseek's latest model.
I started from a very simple LLM, inspired from Andrej Karpathy's "GPT from scratch" video (https://www.youtube.com/watch?v=kCc8FmEb1nY). Then, I added onto that the GRPO algorithm, which in itself is very simple.
I made a GitHub repo if you want to try it out : https://github.com/Al-th/grpo_experiment
363849473754|11 months ago