top | item 45464534

(no title)

quadrature | 4 months ago

I'm not very well versed, but i believe that training requires more memory to store intermediate computations so that you can calculate gradients for each layer.

discuss

No comments yet.