top | item 39978061

(no title)

willj | 1 year ago

I think that’s different. AlphaGo is using reinforcement learning in a context in which there is a clear evaluation function— did a strategy lead to a win or loss.

discuss

Tenoke|1 year ago

I said it's not as simple but there are ways - e.g. you can generate more of your best quality of data, you can try to model the direction of quality or an objective, you can have minor human input at some points to judge which direction performs better, you can objectively verify some of your input to use as a partial objective - code, math, logic etc.