(no title)
rar00 | 1 month ago
> “It’s counterintuitive,” Miller said. “You’d think neurons that signal the wrong pathway would go away with learning.”
Except they don't tweak weights when the model is incorrect, I'm puzzled why he's making that claim?? In equation 32 they show weights are adjusted in the form `dW = K * S * (Wmax - W) * F(A)`, where F(A) is the feedback given the chosen action A (i.e. reward). K is positive and S non-negative, so Weights can be nudged towards a maximal absolute value Wmax in the direction of F(A).
However, they set `F(A) = {1 if A is correct and 0 when incorrect}`. That means weights don't change when the model is wrong and can only be potentiated (`sign(Dw) == sign(F(A)) == +`)
No comments yet.