(no title)
andrekandre | 9 days ago
The sgai_rsp_matmul_q4() stub is planned for RSP microcode:
DMA Q4 weight tiles into DMEM (4KB at a time)
VMULF/VMADH vector multiply-accumulate for 8-lane dot products
Estimated 4-8× speedup over scalar VR4300 inference
----rsp is the gift that keeps on giving; such a forwards-looking architecture (shame about the rambus latency tho)
AutoJanitor|9 days ago
andrekandre|9 days ago