top | item 40373403

(no title)

terafo | 1 year ago

Overwhelming majority of flops is indeed spent on matmuls, but softmax disproportionately uses memory bandwidth, so it generally takes much longer than you'd expect from just looking at flops.

discuss

tehsauce|1 year ago

If cpu softmax were limited by memory bandwidth, then these vectorization optimizations wouldn't improve performance.

cgearhart|1 year ago

Why does it disproportionately use bandwidth?

jacobn|1 year ago

In transformers the attention matrix is N*N, so there are a lot of values to go over. Typically makes it memory bandwidth bound, not compute bound.