To me it sounds like sparse matrix multiplication repackaged as "event-driven spiking computation", where the spikes are simply the non-zero elements that sparse GPU kernels have always been designed to process.
The supposedly dynamic/temporal nature of the model seems to be not applied for GPU execution, collapsing it into a single static computation equivalent to just applying a pre-calculated sparsity mask.
Perhaps a bit cynical of me, but it feels like wrapping standard sparse computing and operator fusion in complex, biological jargon...
The 'brain-inspired' community has always been doing this, since Carver Mead introduced the term 'neuromorphic' in the late 1980s. Reselling banalities as a new great insight. My favourite is "Neuromorphic computing breakthrough could enable blockchain on Mars" [1]. What else can they do? After all, that community has now multiple decades of failure under it's belt. Not a single success. Failure to make progress in AI and failure to say anything of interest about the brain. To paraphrase a US president: In this world nothing can be said to be certain, except death, taxes and neuromphicists exaggerating. (Aside: I was told by someone who applied to YC with a 'neuromorphic' startup that YC said, they don't fund 'neuromorphic'. I am not sure about details ...). The whole 'brain talk' malarkey goes back way longer.
In particular psychology and related subjects,
since their origins as a specialty in the 19th century, have heavily used brain-inspired metaphors that were intended to mislead. Already in the 19th century that was criticised. See [3] for an interesting discussion.
There is something interesting in this post, namely that it's based on non-Nvidia GPUs, in this case MetaX [2]. I don't know how competitive MetaX are today, but I would not bet against China in the longer term.
I believe the argument is that you can also encode information in the time domain.
If we just look at spikes as a different numerical representation, then they are clearly inferior. For example, consider that encoding the number 7 will require seven consecutive pulses on a single spiking line. Encoding the number in binary will require one pulse on three parallel lines.
Binary encoding wins 7x in speed and 7/3=2.333x in power efficiency...
On the other hand, if we assume that we are able to encode information in the gaps between pulses, then things quickly change.
> Our architectural choices are closely aligned with principles observed in biological brains.
How? They point out three design choices: linear attention, MoE layers, and spike coding.
Apparently linear attention is brain-inspired because it can be viewed as a "simplified abstraction of dendritic dynamics with multi-branch morphology." Who knows what that means exactly [1]. They don't discuss it further. MoE layers apparently reflect "a principle of modular specialization." Fine, whatever.
Now, using a dozen attention variants + MoE is bog standard. The real novelty would be spike coding. Page 11 is dedicated to the different ways they could turn signals into spike trains, including such biologically-inspired mechanisms as using two's complement. However, they don't actually do spike coding in a time domain. In their implementation, "spike coding" apparently means to turn activations into integers. Section 3.3.3 claims that this lets us simulate an underlying spiking neural network, so we can validate the spiking approach without using special hardware. But if your SNN can be simulated faithfully on a GPU by turning things into integers, isn't that a bit of a depressing SNN?
Either I'm missing something, or this is just just dressing standard techniques with loads of meaningless jargon. Of course that’s a very popular way to operate in deep learning nowadays.
[1] Like, attention can draw from multiple tokens, sort of like how different spines of a dendrite can draw from multiple axons? Can’t make this stuff up.
>The current implementation adopts pseudo-spiking, where activations are approximated as spike-like signals at the tensor level, rather than true asynchronous event-driven spiking on neuromorphic hardware.
Isn't that in essence very similar to Quantization Aware Training (QaT)?
Can you explain more? Why would that be the case? What is being passed from one layer to the next is not a linear value but the delay until the next spike, which is very different.
SpikingBrain treats 'spikes' as 1-bit quantization stickers. True neural-level sparsity should be input-dependent, time-resolved, and self-organized during learning. If a new circuit diagram cannot 'grow' with every forward pass, then don't blame everyone for treating it as Another Sparse Marketing - oh wait, Neuromorphic Marketing.
[+] [-] augment_me|5 months ago|reply
The supposedly dynamic/temporal nature of the model seems to be not applied for GPU execution, collapsing it into a single static computation equivalent to just applying a pre-calculated sparsity mask.
Perhaps a bit cynical of me, but it feels like wrapping standard sparse computing and operator fusion in complex, biological jargon...
[+] [-] GregarianChild|5 months ago|reply
There is something interesting in this post, namely that it's based on non-Nvidia GPUs, in this case MetaX [2]. I don't know how competitive MetaX are today, but I would not bet against China in the longer term.
[1] https://cointelegraph.com/news/neuromorphic-computing-breakt...
[2] https://en.wikipedia.org/wiki/MetaX
[3] K. S. Kendler, A history of metaphorical brain talk in psychiatry. https://www.nature.com/articles/s41380-025-03053-6
[+] [-] cpldcpu|5 months ago|reply
If we just look at spikes as a different numerical representation, then they are clearly inferior. For example, consider that encoding the number 7 will require seven consecutive pulses on a single spiking line. Encoding the number in binary will require one pulse on three parallel lines.
Binary encoding wins 7x in speed and 7/3=2.333x in power efficiency...
On the other hand, if we assume that we are able to encode information in the gaps between pulses, then things quickly change.
[+] [-] drob518|5 months ago|reply
[+] [-] ziofill|5 months ago|reply
Shouldn’t one bold the better numbers?
[+] [-] rpunkfu|5 months ago|reply
[+] [-] daveguy|5 months ago|reply
[+] [-] asdfasdf1|5 months ago|reply
[+] [-] cgadski|5 months ago|reply
> Our architectural choices are closely aligned with principles observed in biological brains.
How? They point out three design choices: linear attention, MoE layers, and spike coding.
Apparently linear attention is brain-inspired because it can be viewed as a "simplified abstraction of dendritic dynamics with multi-branch morphology." Who knows what that means exactly [1]. They don't discuss it further. MoE layers apparently reflect "a principle of modular specialization." Fine, whatever.
Now, using a dozen attention variants + MoE is bog standard. The real novelty would be spike coding. Page 11 is dedicated to the different ways they could turn signals into spike trains, including such biologically-inspired mechanisms as using two's complement. However, they don't actually do spike coding in a time domain. In their implementation, "spike coding" apparently means to turn activations into integers. Section 3.3.3 claims that this lets us simulate an underlying spiking neural network, so we can validate the spiking approach without using special hardware. But if your SNN can be simulated faithfully on a GPU by turning things into integers, isn't that a bit of a depressing SNN?
Either I'm missing something, or this is just just dressing standard techniques with loads of meaningless jargon. Of course that’s a very popular way to operate in deep learning nowadays.
[1] Like, attention can draw from multiple tokens, sort of like how different spines of a dendrite can draw from multiple axons? Can’t make this stuff up.
[+] [-] OhNoNotAgain_99|5 months ago|reply
[deleted]
[+] [-] bob1029|5 months ago|reply
[+] [-] cpldcpu|5 months ago|reply
[+] [-] cpldcpu|5 months ago|reply
Isn't that in essence very similar to Quantization Aware Training (QaT)?
[+] [-] spwa4|5 months ago|reply
[+] [-] imtringued|5 months ago|reply
https://en.wikipedia.org/wiki/MetaX
They have GPU manufacturers that nobody in the west has ever heard of.
[+] [-] unknown|5 months ago|reply
[deleted]
[+] [-] astrange|5 months ago|reply
[+] [-] weregiraffe|5 months ago|reply
/s
[+] [-] torotoki|5 months ago|reply
[+] [-] gunalx|5 months ago|reply
[+] [-] janalsncm|5 months ago|reply
[+] [-] RLAIF|5 months ago|reply
[+] [-] VeejayRampay|5 months ago|reply
[+] [-] bastawhiz|5 months ago|reply
[+] [-] ramon156|5 months ago|reply
[+] [-] izabera|5 months ago|reply