SpikingBrain 7B – More efficient than classic LLMs

[+] augment_me|5 months ago|reply

To me it sounds like sparse matrix multiplication repackaged as "event-driven spiking computation", where the spikes are simply the non-zero elements that sparse GPU kernels have always been designed to process.

The supposedly dynamic/temporal nature of the model seems to be not applied for GPU execution, collapsing it into a single static computation equivalent to just applying a pre-calculated sparsity mask.

Perhaps a bit cynical of me, but it feels like wrapping standard sparse computing and operator fusion in complex, biological jargon...

[+] GregarianChild|5 months ago|reply

The 'brain-inspired' community has always been doing this, since Carver Mead introduced the term 'neuromorphic' in the late 1980s. Reselling banalities as a new great insight. My favourite is "Neuromorphic computing breakthrough could enable blockchain on Mars" [1]. What else can they do? After all, that community has now multiple decades of failure under it's belt. Not a single success. Failure to make progress in AI and failure to say anything of interest about the brain. To paraphrase a US president: In this world nothing can be said to be certain, except death, taxes and neuromphicists exaggerating. (Aside: I was told by someone who applied to YC with a 'neuromorphic' startup that YC said, they don't fund 'neuromorphic'. I am not sure about details ...). The whole 'brain talk' malarkey goes back way longer. In particular psychology and related subjects, since their origins as a specialty in the 19th century, have heavily used brain-inspired metaphors that were intended to mislead. Already in the 19th century that was criticised. See [3] for an interesting discussion.

There is something interesting in this post, namely that it's based on non-Nvidia GPUs, in this case MetaX [2]. I don't know how competitive MetaX are today, but I would not bet against China in the longer term.

[1] https://cointelegraph.com/news/neuromorphic-computing-breakt...

[2] https://en.wikipedia.org/wiki/MetaX

[3] K. S. Kendler, A history of metaphorical brain talk in psychiatry. https://www.nature.com/articles/s41380-025-03053-6

[+] cpldcpu|5 months ago|reply

I believe the argument is that you can also encode information in the time domain.

If we just look at spikes as a different numerical representation, then they are clearly inferior. For example, consider that encoding the number 7 will require seven consecutive pulses on a single spiking line. Encoding the number in binary will require one pulse on three parallel lines.

Binary encoding wins 7x in speed and 7/3=2.333x in power efficiency...

On the other hand, if we assume that we are able to encode information in the gaps between pulses, then things quickly change.

[+] drob518|5 months ago|reply

Never underestimate the power of marketing.

[+] ziofill|5 months ago|reply

https://github.com/BICLab/SpikingBrain-7B/blob/main/assets/t...

Shouldn’t one bold the better numbers?

[+] rpunkfu|5 months ago|reply

Inspired by GPT-5 presentation :)

[+] daveguy|5 months ago|reply

Well, then none of their model's numbers would be bold and that's not what they/AIs usually see in publications!

[+] asdfasdf1|5 months ago|reply

SpikingBrain Technical Report: Spiking Brain-inspired Large Models https://arxiv.org/abs/2509.05276

[+] cgadski|5 months ago|reply

The technical report says (page 7):

> Our architectural choices are closely aligned with principles observed in biological brains.

How? They point out three design choices: linear attention, MoE layers, and spike coding.

Apparently linear attention is brain-inspired because it can be viewed as a "simplified abstraction of dendritic dynamics with multi-branch morphology." Who knows what that means exactly [1]. They don't discuss it further. MoE layers apparently reflect "a principle of modular specialization." Fine, whatever.

Now, using a dozen attention variants + MoE is bog standard. The real novelty would be spike coding. Page 11 is dedicated to the different ways they could turn signals into spike trains, including such biologically-inspired mechanisms as using two's complement. However, they don't actually do spike coding in a time domain. In their implementation, "spike coding" apparently means to turn activations into integers. Section 3.3.3 claims that this lets us simulate an underlying spiking neural network, so we can validate the spiking approach without using special hardware. But if your SNN can be simulated faithfully on a GPU by turning things into integers, isn't that a bit of a depressing SNN?

Either I'm missing something, or this is just just dressing standard techniques with loads of meaningless jargon. Of course that’s a very popular way to operate in deep learning nowadays.

[1] Like, attention can draw from multiple tokens, sort of like how different spines of a dendrite can draw from multiple axons? Can’t make this stuff up.

[+] OhNoNotAgain_99|5 months ago|reply

[deleted]

[+] bob1029|5 months ago|reply

https://news.ycombinator.com/item?id=45206420

[+] cpldcpu|5 months ago|reply

Well, it would still allow to deploy the trained model to SNN hardware, if it existed.

[+] cpldcpu|5 months ago|reply

>The current implementation adopts pseudo-spiking, where activations are approximated as spike-like signals at the tensor level, rather than true asynchronous event-driven spiking on neuromorphic hardware.

Isn't that in essence very similar to Quantization Aware Training (QaT)?

[+] spwa4|5 months ago|reply

Can you explain more? Why would that be the case? What is being passed from one layer to the next is not a linear value but the delay until the next spike, which is very different.

[+] imtringued|5 months ago|reply

In a few years China will be completely independent from Nvidia.

https://en.wikipedia.org/wiki/MetaX

They have GPU manufacturers that nobody in the west has ever heard of.

[+] unknown|5 months ago|reply

[deleted]

[+] astrange|5 months ago|reply

They need TSMC for that.

[+] weregiraffe|5 months ago|reply

Then they'll have no reason to conquer Taiwan.

/s

[+] torotoki|5 months ago|reply

They use MetaX GPUs instead NVDIA's...? This point is actually more surprising.

[+] gunalx|5 months ago|reply

So significantly worse than qwen2.5, kinda useless in the current landscape. but always fun with more arcitechtures.

[+] janalsncm|5 months ago|reply

They compare to Llama3.1 which is 13 months old and qwen 2.5 which is 9 months old. And they don’t beat qwen.

[+] RLAIF|5 months ago|reply

SpikingBrain treats 'spikes' as 1-bit quantization stickers. True neural-level sparsity should be input-dependent, time-resolved, and self-organized during learning. If a new circuit diagram cannot 'grow' with every forward pass, then don't blame everyone for treating it as Another Sparse Marketing - oh wait, Neuromorphic Marketing.

[+] VeejayRampay|5 months ago|reply

it's funny to observe how picky and cynical the HN crowd suddenly becomes when the disruptive technology is from china

[+] bastawhiz|5 months ago|reply

What part of this is disruptive? It kind of has to work well to be disruptive, doesn't it?

[+] ramon156|5 months ago|reply

You can't be critical anymore?

[+] izabera|5 months ago|reply

deepseek is from china and all their papers have been very well received

45 comments