(no title)
didroe | 5 months ago
How does Reinforcement Learning force the weights to be logically consistent? Isn't it just about training using a coarser/more-fuzzy granularity of fitness?
More generally, is it really solving the task if it's given a large number of attempts and an oracle to say whether it's correct? Humans can answer the questions in one shot and self-check the answer, whereas this is like trial and error with an external expert who tells you to try again.
No comments yet.