In fact, it can be slower because hardware is probably not optimized for the 1-bit case, so there may be a lot of low-hanging fruit for hardware designers and we may see improvements in the next iteration of hardware.
FPGA's could be highly-competitive for models with unusual, but small, bit lengths. Especially single bits since their optimizers will handle that easily.
In this paper, each iteration has to be slower. Because they need to calculate both their new method (which may be faster) and also the traditional method (because they need a float gradient). And old+new will always be slower than just old.
amelius|5 months ago
In fact, it can be slower because hardware is probably not optimized for the 1-bit case, so there may be a lot of low-hanging fruit for hardware designers and we may see improvements in the next iteration of hardware.
nlitened|5 months ago
nickpsecurity|5 months ago
fxtentacle|5 months ago
unknown|5 months ago
[deleted]