The short answer is that there are dozens of ways to use that GEMM to do convolution (convolution algorithms), and there are NUM_ALGORITHMS * NUM_LAYERS way to implement the network.
Our toolkit figures out which of those arrangements are the fast ones!
No comments yet.