top | item 30081553 (no title) kanaffa12345 | 4 years ago you don't transpose it before the matmul, you always have it transposed (i.e., when you print the weights of a linear layer in pytorch, you're actually seeing (A^t)^t and what's stored is A^t. discuss order hn newest No comments yet.
No comments yet.