top | item 33214906

(no title)

patresh | 3 years ago

If you need larger batch sizes but don't have the VRAM for it, have a look at gradient accumulation (https://kozodoi.me/python/deep%20learning/pytorch/tutorial/2...).

You can accumulate the gradients of multiple batches before doing the weight update step. This allows you to run effectively much larger batch sizes than your GPU would allow without it.

discuss

stephanst|3 years ago

Yep, this is a very valid point and I need to look more into this... which means rebuilding a lot of my toolchain but I think it would ultimately be worth the time investment!