top | item 47107114

(no title)

andrekandre | 9 days ago

   The sgai_rsp_matmul_q4() stub is planned for RSP microcode:

     DMA Q4 weight tiles into DMEM (4KB at a time)
     VMULF/VMADH vector multiply-accumulate for 8-lane dot products
     Estimated 4-8× speedup over scalar VR4300 inference
----

rsp is the gift that keeps on giving; such a forwards-looking architecture (shame about the rambus latency tho)

discuss

order

AutoJanitor|9 days ago

We are going to use the gpu 128simd soon but it only has 4kb ram addressable so matmul offload in small chunks!

andrekandre|9 days ago

thats such really cool work; i wish i could get payed to do stuff like this, more power to you all ^^