(no title)
iFire | 7 days ago
I thought the latest advance in computing (spring 2025 - last year) is self-play / reinforcement learning. Like we've ran out of training data a few years ago.
https://github.com/OpenPipe/ART
Reinforcement learning having the large language model devise puzzles that they solve via llm-as-judge.
The definition of llm-as-judge is your llm generate 8-12 trajectories and a different llm judges the result. I'd use an oracle like windows or linux operating system execution for the problem of ISA-assembly creation.
The winning entries are used to train the large language model.
No comments yet.