top | item 37641492

(no title)

mfkasim1 | 2 years ago

Yes, there is no convergence guarantee, but what we found is that typical untrained RNN units (e.g., GRU, LSTM, or a simple MLP for NeuralODE) can converge within 3-5 iterations which gives them a huge speed up over the sequential method. The non-convergence typically happens after many thousand steps of training, but that can be addressed by saving the RNN output from the previous training step as the initial guess for the next training step.

Adding more context from the paper. Although there is no convergence guarantee in forward calculation, the gradient computation only requires 1 iteration and always converge (see section 3.1.1), so even though the forward calculation still uses sequential method, the acceleration in backward computation might be achieved with our method.

discuss

No comments yet.