(no title)
ogrisel | 1 year ago
Algorithmic puzzles, on the other hand, both require reasoning and are easy to verify.
There are other things in coding that are both useful and easy to verify: checking that the generated code follows formatting standards or generating outputs with a specific data schema and so on.
godelski|1 year ago
FieryTransition|1 year ago
Another issue is, how much data can you synthesize in such a way, so that you can construct both the problem and solution, so that you know the answer before using it as a sample.
Ie, some problems are easier to make knowing you can construct the problem yourself, but if you were to solve said problems, with no prior knowledge, they would be hard to solve, and could be used as a scoring signal?
Ie, you are the Oracle and whatever model is being trained doesn't know the answer, only if it is right or wrong. But I don't know if the reward function must be binary or on a scale.
Does that make sense or is it wrong?
voxic11|1 year ago