TopoNets: High performing vision and language models with brain-like topography

gzer0|1 year ago

I spent time working with Andrej and the rest of the FSD team back in 2020/2021, and we had plenty of conversations on how human visual processing maps onto our neural network architectures. Our approach—transformer-based attention blocks, multi-scale feature extraction, and temporal fusion—mirrors elements of the biological visual cortex (retina → LGN → V1 → V2 → V4 → IT) which break down raw inputs and integrate them over time. It’s amazing how closely this synthetic perceptual pipeline parallels the way our own brains interpret the world.

The key insight we discovered was that explicitly enforcing brain-like topographic organization (as some academic work attempts - such as this one here) isn't necessary - what matters is having the right functional components that parallel biological visual processing. Our experience showed that the key elements of biological visual processing - like hierarchical feature extraction and temporal integration - emerge naturally when you build architectures that have to solve real visual tasks.

The brain's organization serves its function, not the other way around. This was validated by the real-world performance of our synthetic visual cortex in the Tesla FSD stack.

Link to the 2021 Tesla AI day talk: https://www.youtube.com/live/j0z4FweCy4M?t=3010s

lukan|1 year ago

"It’s amazing how closely this synthetic perceptual pipeline parallels the way our own brains interpret the world."

It is amazing, that the synthetic pipeline, that was build to mimick the brain, seems to mimick the brain?

That sounds a bit tautological and otherwise I doubt we have really understood how our brain exactly interprets the world.

In general this is definitely interesting research, but worded like this, it smells a bit hyped to me.

iandanforth|1 year ago

Unlike neural networks the brain contains massive numbers of lateral connections. This, combined with topographical organization, allows it to do within layer temporal predictions as activations travel across the visual field, create active competition between similarly tuned neurons in a layer (forming natural sub networks), and quite a bit more. So, yeah, the brain's organisation serves it's function, and it does so very very well.

dmarchand90|1 year ago

I've found how CNN map to visual cortex to be very clear. But I've always been a bit confused about how llms map to the brain. Is that even the case?

chaumaha|1 year ago

[deleted]

energy123|1 year ago

The main reason topography emerges in physical brains is because spatially distant connections are physically difficult and expensive in biological systems. Artificial neural nets have no such trade-off. So what's the motivation here? I can understand this might be a very good regularizer, so it could help with generalization error on small-data tasks. But hard to see why this should be on the critical path to AGI. As compute and data grows, you want less inductive bias. For example, CNN will beat ViT on small data tasks, but that flips with enough scale because ViT imposes less inductive bias. Or at least any inductive bias should be chosen because it models the structure of the data well, such as with causal transformers and language.

AYBABTME|1 year ago

Locality of data and computation is very important in neural nets. It's the number one reason why training and inference are as slow as they are. It's why GPUs need super expensive HBM memory, why NVLink is a thing, why Infiniband is a thing.

If the problem of training and inference on neural networks can be optimized so that a topology can be used to keep closely related data together, we will see huge advancements in training and inference speed, and probably in model size as a result.

And speed isn't just speed. Speed makes impossible (not enough time in our lifetime) things possible.

A huge factor in Deepseek being able to train on H800 (half HBM bandwith as H100) is that they used GPU cores to compress/decompress the data moved around between the GPU memory and the compute units. This reduces latency in accessing data and made up for the slower memory bandwith (which translates in higher latency when fetching data). Anything that reduces the latency of memory accesses is a huge accelerator for neural nets. The number one way to achieve this is to keep related data next to each other, so that it fits in the closest caches possible.

vlovich123|1 year ago

Unless GPUs work markedly differently somehow or there’s been some fundamental shift in computer architecture I’m not aware of, spatial locality is still a factor in computers.

Aside from HW acceleration today, designs like Cebras would benefit heavily by reducing the amount of random access from accessing the weights (and thus freeing up cross-chip memory bandwidth for other things).

cma|1 year ago

> The main reason topography emerges in physical brains is because spatially distant connections are physically difficult and expensive in biological systems.

The brain itself seems to have bottlenecks that aren't distance related, like hemispheres and the corpus callosum that are preserved over all placental mammals and other mammalian groups have something similar and still hemispheres. Maybe it's just an artifact of bilateral symmetry that is stuck in there from path dependence, or forcing a redundancy to make damage more recoverable, but maybe it has a big regularizing or alternatively specializing effect (regularization like dropout tends to force more distributed representations which seems kind of opposite to this work and other work like "Seeing is Believing: Brain-Inspired Modular Training for Mechanistic Interpretability," https://arxiv.org/abs/2305.08746 ).

jlpom|1 year ago

It increases modularity and small-worldness, which are in my book critical for AGI (surprised by the way that this publication doesn't cite https://www.nature.com/articles/s42256-023-00748-9).

exe34|1 year ago

> CNN will beat ViT on small data tasks, but that flips with enough scale because ViT imposes less inductive bias

any idea why this is the case? CNN have the bias that neighbouring pixels are somehow relevant - they are neighbours. ViTs have to re-learn this from scratch. So why do they end up doing better than CNN?

TZubiri|1 year ago

Maybe this would be relevant for datacenters with significant distance between machines, or multidatacenter systems.

xpl|1 year ago

> So what's the motivation here?

Better interpretability, I suppose. Could give insights into how cognition works.

jv22222|1 year ago

I had this idea the other day. Not sure if it relates but maybe?

https://twitter.com/justinvincent/status/1884357300703400274

mercer|1 year ago

I imagine it could be easier to make sense of the 'biological' patterns that way? like, having bottlenecks or spatially-related challenges might have to be simulated too, to make sense of the ingested 'biological' information.

unknown|1 year ago

[deleted]

ziofill|1 year ago

Perhaps they are more easily compressible? Once a bunch of nearby weights have similar roles one may not need all of them.

chaumaha|1 year ago

[deleted]

FrereKhan|1 year ago

This paper imports an arbitrarily-chosen aspect of cortical architecture — topological maps of function — and ignores every other aspect of biological neural tissue. The resulting models show lower performance for the same number of parameters — not surprising, since they are more constrained compared with baseline. They may be slightly more robust against pruning — not surprising, since they are more regularised.

The figures show individual seeds, presumably, with no statistical analysis in the performance or pruning comparisons, so the null hypothesis is there is no difference between toponets and baseline. I would never let this paper be submitted by my team.

We haven't learned anything about the brain, or about ANNs.

brrrrrm|1 year ago

this paper plays into some popular fantasy about the aesthetic of ANNs. it’s not scientifically useful

slama|1 year ago

The title here doesn't seem to match. The paper is called "TopoNets: High Performing Vision and Language Models with Brain-Like Topography"

Even with their new method, models with topography seem to perform worse than models without.

dang|1 year ago

Submitted title was "Inducing brain-like structure in GPT's weights makes them parameter efficient". We've reverted it now in keeping with the site guidelines (https://news.ycombinator.com/newsguidelines.html).

Since the submitter appears to be one of the authors, maybe they can explain the connection between the two titles? (Or maybe they already have! I haven't read the entire thread)

vessenes|1 year ago

I hate to dog on research papers. They’re work to write. That said, I think this paper is not likely to be of interest to AI researchers — instead it may be of interest to Neuroscience folks or other brain research types.

The lede — adding topography worsens networks at similar weights — is not only buried, it’s obscured with statements claiming that topo networks show less upheaval when scaled down, e.g. they are more efficient than similar weight networks.

It’s hard for me to see how both these things can be true — the graphs show the more topography is added, the worse the networks perform at the trained model sizes.

To have the second statement “They compress better and are therefore more efficient” also be true, I think you’d need to show a pretty remarkable claim, which is that while a model trained at the same scale as a llama architecture is worse, when you scale them both down, this model becomes not only better than the scaled down llama, but also better than a natively trained model at the new smaller scale.

There is no proof of this in the paper, and good reason to be skeptical of this idea based on the data presented.

That said, like a lot of ideas in AI, this .. works! You can train a model successfully imposing these outside structures on it, and that model doesn’t even suck very much. Which is a cool statement about complexity theory and the resilience of these architectures, in my opinion. But I don’t think it says much else about either the brain or underlying AI ‘truths’.

unknown|1 year ago

[deleted]

igleria|1 year ago

This is excellent. Since reading https://books.google.de/books/about/Models_of_the_Mind.html?... I've been expecting someone to start looking back into biology to try to move forward. I guess the poster is one of the authors. Kudos!

mayukhdeb|1 year ago

Thank you for your kind words!

Indeed. The problem with most AI research today is they simply do trial and error with large amounts of compute. No room for taking inspiration from nature, which is requires more thought and less FLOPS.

unknown|1 year ago

[deleted]

LZ_Khan|1 year ago

Shouldn't there be a comparison in performance on common benchmarks to other models?

Like a 7B toponet model vs a 7B Llama model?

As a layperson I don't understand why topology is a thing to optimize for.

TOMDM|1 year ago

The only potential benefit shown in the paper is the topologically local models seem to be more resilient after pruning.

So you may be able to prune a 7B model down to 6B while maintaining most of the capability.

michalsustr|1 year ago

The blurring in the sheets and the topo loss reminded me of https://arxiv.org/abs/2408.05446

light_hue_1|1 year ago

They bury the part where inducing brain like structure hurts performance!

This is a method to just hurt your network in exchange for nothing useful at all aside from some sketchy story that this is "brain like".

mayukhdeb|1 year ago

Our goal was never to optimize for performance. There's a long standing hypothesis that topographic structure in the human brain leads to metabolic efficiency. Thanks to topography in ANNs, we were able to test out this hypothesis in a computational setting.

> sketchy story this is "brain like".

we reproduce the hallmarks of functional organization seen in the visual and language cortex of the brain. I encourage you to read the paper before making such comments

chaumaha|1 year ago

[deleted]

devmor|1 year ago

Is this "brain-like" in any functional way, or "brain-like" in the same way that a tall rectangle is "door-like" even if it doesn't share any functions with a door?

I know quite a bit about machine learning, but very little to nothing about neuroscience and human cognition, so I am curious how an expert (that didn't work on the paper) would describe it.

(Forgive me for the pre-emptive negativity but I am so utterly exhausted by dishonest comparisons to sapient thought in the field of artificial intelligence that it has nearly drained me of the incredible amount of enthusiasm I used to carry for it.)

mayukhdeb|1 year ago

It is indeed brain-like in a functional way. Topographic structure is what enables the brain to have low dimensionality and metabolic efficiency. We find that inducing such structure in neural nets made them have significantly lower dimensionality and also more parameter efficient (After training, we could take advantage of the structure to remove ~80% of the weights in topographic layers without sacrificing performance)

akokanka|1 year ago

[deleted]

68 comments