(no title)
aimanbenbaha | 1 year ago
Importantly the barrier is that open domains are too complex and too undefined to have a clear reward function. But if someone cracks that — meaning they create a way for AI to self-optimize in these messy, subjective spaces — it'll completely revolutionize LLMs through pure RL.
Here's the link of the tweet: https://x.com/karpathy/status/1821277264996352246
No comments yet.