top | item 43168032

(no title)

deyiao | 1 year ago

Is the PTX that everyone was looking forward to included this time?

discuss

order

find0x90|1 year ago

Yes, there's some in the csrc/kernels directory. Search for 'asm' to find uses of it.

swyx|1 year ago

> the PTX that everyone was looking forward to

explanation for the rest of us why this is so important?

ta988|1 year ago

Parallel Thread Execution. Think of them as opcodes for the Nvidia GPUs. They are a bit more complex that your traditional opcodes (the lowest level of abstraction accessible to users) in CPUs, as you can specify cache parameters, memory barriers etc.

There are documented combinations of parameters for those instructions but if you fuzz (search new combinations in a random or organized way because you hope some will work the way you want) you can find new ones with unexpected effects or with advantages (in various ways like not polluting caches, speed...)

Which is the case for example for ld.global.nc.L1::no_allocate.L2::256B that they use in deepseek that provides significant acceleration while beeing reliable (although not working on all architectures so they have ways to disable it)

find0x90|1 year ago

Much of the hype around DeepSeek is due to their extraordinarily low training and inference costs. They achieved this by optimizing their training code, apparently using PTX in addition to CUDA. PTX is kind of an intermediate assembly language for NVIDIA GPUs and people are eager to see how it was used.