(no title)
jsharf | 5 months ago
When you take a batch and calculate gradients, you’re effectively calculating a direction the weights should move in, and then taking a step in that direction. You can do more steps at once by doing what you say, but they might not all be exactly in the right direction, so overall efficiency is hard to compare
I am not an expert, but if I understand correctly I think this is the answer.
immibis|5 months ago