Yes but in practice, if you compute K=X.wk, Q=X.wq and then K.tQ you make three matrice multiplication.
Wouldn't be faster to compute W=wk.twq beforhand and then just X.W.tX which will be just two matrices multiplication ?
Is there something I am missing ?
yorwba|1 month ago