top | item 43677696

(no title)

cttet | 10 months ago

In all their experiments, backprop is used for most of their parameter though...

discuss

hansvm|10 months ago

There is a meaningful distinction. They only use backprop one layer at a time, requiring additional space proportional to that layer. Full backprop requires additional space proportional to the whole network.

It's also a bit interesting as an experimental result, since the core idea didn't require backprop. Being an implementation detail, you could theoretically swap in other layer types or solvers.