top | item 39966478

(no title)

"the bigger you make epsilon "... " thus slower the training progress will be"

Sounds like variable epsilon is optimal, that's instead of learning rate, or both together. Would be nice if this can somehow be algorithmically regulated in generic way.

discuss

sdenton4|1 year ago

The training slowdown is not really a problem... There's a pretty wide range of robust, good-enough values that don't slow things down much at all. As with all optimizer cruft, the 'optimal' value is going to be problem-dependent and a pain in the butt to actually find. So it's best to find a good-enough value that works in most contexts and not worry about it.