top | item 40373403 (no title) terafo | 1 year ago Overwhelming majority of flops is indeed spent on matmuls, but softmax disproportionately uses memory bandwidth, so it generally takes much longer than you'd expect from just looking at flops. discuss order hn newest tehsauce|1 year ago If cpu softmax were limited by memory bandwidth, then these vectorization optimizations wouldn't improve performance. cgearhart|1 year ago Why does it disproportionately use bandwidth? jacobn|1 year ago In transformers the attention matrix is N*N, so there are a lot of values to go over. Typically makes it memory bandwidth bound, not compute bound. load replies (2)
tehsauce|1 year ago If cpu softmax were limited by memory bandwidth, then these vectorization optimizations wouldn't improve performance.
cgearhart|1 year ago Why does it disproportionately use bandwidth? jacobn|1 year ago In transformers the attention matrix is N*N, so there are a lot of values to go over. Typically makes it memory bandwidth bound, not compute bound. load replies (2)
jacobn|1 year ago In transformers the attention matrix is N*N, so there are a lot of values to go over. Typically makes it memory bandwidth bound, not compute bound. load replies (2)
tehsauce|1 year ago
cgearhart|1 year ago
jacobn|1 year ago