If you have a look at the different fortran implementations of BLAS gemm (matrix multiplication), you'll see that the transposed matrix cases are treated specifically. In fact, IIRC, the gemm function has flags to indicate for each matrix if it is transposed.
celrod|5 years ago
Just writing three loops and letting the compiler optimize it was much faster for `A * B'`, so it must be a pretty naive implementation getting called.