top | item 42885723

(no title)

gzer0 | 1 year ago

I spent time working with Andrej and the rest of the FSD team back in 2020/2021, and we had plenty of conversations on how human visual processing maps onto our neural network architectures. Our approach—transformer-based attention blocks, multi-scale feature extraction, and temporal fusion—mirrors elements of the biological visual cortex (retina → LGN → V1 → V2 → V4 → IT) which break down raw inputs and integrate them over time. It’s amazing how closely this synthetic perceptual pipeline parallels the way our own brains interpret the world.

The key insight we discovered was that explicitly enforcing brain-like topographic organization (as some academic work attempts - such as this one here) isn't necessary - what matters is having the right functional components that parallel biological visual processing. Our experience showed that the key elements of biological visual processing - like hierarchical feature extraction and temporal integration - emerge naturally when you build architectures that have to solve real visual tasks.

The brain's organization serves its function, not the other way around. This was validated by the real-world performance of our synthetic visual cortex in the Tesla FSD stack.

Link to the 2021 Tesla AI day talk: https://www.youtube.com/live/j0z4FweCy4M?t=3010s

discuss

order

lukan|1 year ago

"It’s amazing how closely this synthetic perceptual pipeline parallels the way our own brains interpret the world."

It is amazing, that the synthetic pipeline, that was build to mimick the brain, seems to mimick the brain?

That sounds a bit tautological and otherwise I doubt we have really understood how our brain exactly interprets the world.

In general this is definitely interesting research, but worded like this, it smells a bit hyped to me.

Shorel|1 year ago

I interpreted it the other way around.

We can think of a solution space, with potentially many good solutions to the vision problem, and we can, in science fiction-like speculation, that the other solutions will be very different and surprise us.

Then this experiment shows its solution is the same we already knew, and that's it.

Then there aren't many good potential solutions, there is only one, and the ocean of possibilities becomes the pond of this solution.

trhway|1 year ago

The convolutional kernels in the first levels do converge to Gabors like the ones in V1 (and there were math works in the 90-ies, in neuro research, about optimality of such kernels) so it wouldn't be surprising if higher levels would converge to something that is similar to the higher levels of visual cortex (like hierarchical feature aggregation that is nicely illustrated by deep dreaming and also feels like it can be optimal under reasonable conditions and thus would be expected to emerge).

perching_aix|1 year ago

Did you read the part where he explicitly mentioned that they discovered how enforcing that architecture was not necessary, as it would emerge on its own?

iandanforth|1 year ago

Unlike neural networks the brain contains massive numbers of lateral connections. This, combined with topographical organization, allows it to do within layer temporal predictions as activations travel across the visual field, create active competition between similarly tuned neurons in a layer (forming natural sub networks), and quite a bit more. So, yeah, the brain's organisation serves it's function, and it does so very very well.

dmarchand90|1 year ago

I've found how CNN map to visual cortex to be very clear. But I've always been a bit confused about how llms map to the brain. Is that even the case?

nickpsecurity|1 year ago

They probably don’t. They’re very different. LLM’s seem to be based on pragmatic, mathematical techniques developed over time to produce patterns from data.

There’s at least three fields in this:

1. Machine learning using non-neurological techniques (most stuff). These use a combination of statistical algorithms stitched together with hyperparameter tweaking. Also, usually global optimization by heavy methods like backpropagation.

2. “Brain-inspired” or “biologically accurate”algorithms that try to imitate the brain. They sometimes include evidence their behavior matches experimental observations of brain behavior. Many of these use complex neurons, spiking nets, and/or local learning (Hebbian).

(Note: There is some work on hybrids such as integrating hippocampus-like memory or doing limited backpropagation on Hebbian-like architectures.)

3. Computational neuroscience which aims to make biologically-accurate models at various levels of granularity. Their goal is to understand brain function. A common reason is diagnosing and treating neurological disorders.

Making an LLM like the brain would require use of brain-inspired components, multiple systems specialized for certain tasks, memory integrated into all of them, and a brain-like model for reinforcement. Imitating God’s complex design is simply much more difficult than combining proven algorithms that work well enough. ;)

That said, I keep collecting work on both efficient ML and brain-inspired ML. I think some combination of the techniques might have high impact later. I think the lower, training costs of some brain-inspired methods, especially Hebbian learning, justify more experimentation by small teams with small, GPU budgets. Might find something cost-effective in that research. We need more of it on common platforms, too, like HughingFace libraries and cheap VM’s.