Ha made me chuckle. For those wondering seriously about this, it’s not a viable optimization because weights are not readily compressible via JPEG/DCT, and there are a limited number of these units on the chip which bottlenecks throughout, meaning speed is dwarfed by simply reading uncompressed weights from HBM.
jhoho|5 months ago
touisteur|5 months ago
moralestapia|5 months ago
I won an GPU hackathon back in 2019 doing something very similar to this; although the other way around, I was compressing weights using hardware modules.
heavyset_go|5 months ago