(no title)
charleshn | 6 months ago
Of course we can, this is a non issue.
See e.g. AlphaZero [0] that's 8 years old at this point, and any modern RL training using synthetic data, e.g. DeepSeek-R1-Zero [1].
charleshn | 6 months ago
Of course we can, this is a non issue.
See e.g. AlphaZero [0] that's 8 years old at this point, and any modern RL training using synthetic data, e.g. DeepSeek-R1-Zero [1].
jeremyjh|6 months ago
Yes, distillation is a thing but that is more about compression and filtering. Distillation does not produce new data in the same way that chess games produce new positions.
charleshn|6 months ago
But generally the idea is that it's, you need some notion of reward, verifiers etc.
Works really well for maths, algorithms, amd many things actually.
See also this very short essay/introduction: https://www.jasonwei.net/blog/asymmetry-of-verification-and-...
That's why we have IMO gold level models now, and I'm pretty confident we'll have superhuman mathematics, algorithmic etc models before long.
Now domains which are very hard to verify - think e.g. theoretical physics etc - that's another story.
voxic11|6 months ago
scotty79|6 months ago
You make models talk to each other, create puzzles for each other's to solve, ask each other to make cases and evaluate how well they were made.
Will some of it look like ramblings of pre-scientific philosophers? (or modern ones because philosophy never progressed after science left it in the dust)
Sure! But human culture was once there too. And we pulled ourselves out of this nonsense by the bootstraps. We didn't need to be exposed to 3 alien internet's with higher truth.
It's really a miracle that AIs got as much as they did from purely human generated mostly garbage we cared to write down.
cs702|6 months ago
That's option (2) in the parent comment: synthetic data.