Yep, and I'm not saying its a bad approach! Just trying to answer "why is that any worse than, say, starting with randomly initialized weights in general?" wrt gradient passing
I'm not sure I'd agree with the "noisy" characterization - which to me implies stochasticity-, whereas this is just blocking off the flow of gradient information to save memory.
MiroF|6 years ago
I'm not sure I'd agree with the "noisy" characterization - which to me implies stochasticity-, whereas this is just blocking off the flow of gradient information to save memory.