top | item 46367690

I used RL fine-tuning to make an LLM generate ugly and unpythonic FizzBuzz code

4 points| seanrrr | 2 months ago |seantey.github.io

1 comment

order

seanrrr|2 months ago

I wrote up a blog post for a hackathon project where I used RL fine-tuning to make an LLM generate intentionally ugly and unpythonic FizzBuzz code. The post covers what I learned about reward shaping and GRPO. Feedback on the writing or content is welcome!