top | item 43179843

(no title)

Bimos | 1 year ago

> FFMA SASS interleaving

> We observe a performance improvement in the CUTLASS FP8 kernel between NVCC 12.2 and 12.3. By comparing the compiled SASS, we discover that one bit in a series of FADD instructions is flipped in an interleaving pattern. After referencing some open-source CUDA assembler implementations, we identified that this bit controls yield, which may enhance warp-level parallelism (just a guess, yielding the current warp and let other warps work).

> To leverage this, we develop a similar script to modify the FFMA instructions in the compiled binary. Besides simply modifying the yield bit, we also flip the reuse bit (registers cannot be reused if the warp is yielded). This adjustment improves performance (10%+ in some cases) for fine-grained scaling FP8 GEMMs by creating more opportunities to overlap MMA instructions with promotion FFMA instructions.

I would say it is really mind-blowing.

discuss

blackeyeblitzar|1 year ago

From what I read elsewhere, this is the type of typical performance optimization for matrix math you would see when performance is critical. It’s just not been applied yet to this specific problem by other AI players since it wasn’t a necessity for other companies. But eventually everyone would probably end up here regardless.

mitthrowaway2|1 year ago

How many people does it take to implement this? A 10% gain in performance could pay for a lot of people's salaries when your company is spending hundreds of millions on GPU clusters.

Bimos|1 year ago

I think most AI players rely on high performance GEMM. But most people would be satisfied with cutlass or cublas, and the others implement gemm themselves, but not necessarily use undocumented features?

Zacharias030|1 year ago

I‘ve only seen it done by hedge funds so far. What were you referring to?

fracon|1 year ago

[deleted]

shaklee3|1 year ago

scott grey figured out this exact thing and more back in 2015 for maxwell, and it's been written about many times since by other people.

ETH_start|1 year ago

[flagged]

tough|1 year ago

I think he might mean hyperbolically figuratively so

dang|1 year ago

Literally literally means not literally.

I love it when words turn into their opposites!

Bimos|1 year ago

I edited it.

kneegerman|1 year ago

orthogonally