(no title)
abstractcontrol | 1 year ago
Even so, creating all the abstractions needed to implement even regular matrix multiplication in Spiral in a generic fashion took me two months, so I'd consider that good enough exercise.
You could do it a lot faster by specializing for specific matrix sizes, like in the Cuda examples repo by Nvidia, but then you'd miss the opportunity to do the tensor magic that I did in the playlist.
selimthegrim|1 year ago
abstractcontrol|1 year ago