top | item 43329634

(no title)

eachro | 11 months ago

During the openai gym era of RL, one of the great selling pts was that RL was very approachable for a new comer as the gym environments were small and tractable that a hobbyist could learn a little bit of RL, try it out on cartpole and see how it'd perform. Are there similarly tractable RL tasks/learning environments with LLMs? From the outside, my impression is that you need some insane GPU access to even start to mess around with these models. Is there something one can do on a normal MacBook air for instance in this LLM x RL domain?

discuss

al_th|11 months ago

This is entirely doable.

I'm absolutely not versed in RL, but I wanted to understand GRPO, the RL algorithm behind Deepseek's latest model.

I started from a very simple LLM, inspired from Andrej Karpathy's "GPT from scratch" video (https://www.youtube.com/watch?v=kCc8FmEb1nY). Then, I added onto that the GRPO algorithm, which in itself is very simple.

I made a GitHub repo if you want to try it out : https://github.com/Al-th/grpo_experiment

363849473754|11 months ago

GRPO project is neat. Would you be willing to do a Karpathy-style explainer, breaking down the algorithm from scratch? It’s hard to understand on its own without prior background knowledge.