top | item 40940425 (no title) ex3ndr | 1 year ago I am wondering why flash attention is like 5x slower with variable masking than without it? Lack of good masking support almost zeros out the optimizations discuss order hn newest chillee|1 year ago Where are you seeing these benchmarks?
chillee|1 year ago