(no title)
taktoa | 1 year ago
Clock only needs to be distributed to sequential components like flip flops or SRAMs. The number of clock distribution wire-millimeters in typical chip is dwarfed by the number of data wire-millimeters, and if a neural network is well trained and quantized activations should be random, so number of transitions per clock should be 0.5 (as opposed to 1 for clock wires), meaning that power can't be dominated by clock. The flops that prevent clock skew are a small % of area, so I don't think those can tip the scales either. On the other hand, in asynchronous digital logic you need to have valid bit calculation on every single piece of logic, which seems like a pretty huge overhead to me.
HarHarVeryFunny|1 year ago
There's more promise in analog chip designs, such as here:
https://spectrum.ieee.org/low-power-ai-spiking-neural-net
Or otherwise smarter architectures (software only or S/W+H/W) that design out the unnecessary calculations.
It's interesting to note how extraordinarily wasteful transformer-based LLMs are too. The transformer was designed part inspired by linguistics and part based on the parallel hardware (GPU's etc) available to run it on. Language mostly has only local sentence structure dependencies, yet transformer's self-attention mechanism has every word in a sentence paying attention to every other word (to some learned degree)! Turns out it's better to be dumb and fast than smart, although I expect future architectures will be much more efficient.