top | item 39691172

(no title)

apienx | 1 year ago

A 4 year old child has 16k wake hours x 3600 s/hour x 1^6 optical nerve fibers x 2 eyes x 10 bytes/s = 1^15 bytes (approximation by Yann LeCun).

Processing visual input is the current bottleneck for robots that want to make sense of the physical world. Glad somebody's looking into it (no pun intended). I just hope their plan is more sophisticated than throwing more computational power at the problem.

discuss

Jensson|1 year ago

You need much less if you are fine with horse level understanding of 3d environments. Those get to a working level much faster (hours) and are still good enough to navigate complex environments safely and not step on children.

Then you realize the limitation isn't the training data but the base model that was trained from hundreds of millions of years of evolution, and you start to see the real potential hurdle we have to clear.

thfuran|1 year ago

Something like half a billion years of pretraining if you start counting from the first brain.

hn_acker|1 year ago

> A 4 year old child has 16k wake hours x 3600 s/hour x 1^6 optical nerve fibers x 2 eyes x 10 bytes/s = 1^15 bytes (approximation by Yann LeCun).

Unfortunate typo. You meant 10^15 bytes at the end.

Thanks to your citation I was able to find a podcast transcript [1] with Yann LeCun's explanation:

> If you talk to developmental psychologists and they tell you a four-year-old has been awake for 16,000 hours in his or her life, and the amount of information that has reached the visual cortex of that child in four years is about 10 to 15 bytes.

The transcript is missing "the" (10 to the 15 bytes). The corresponding timestamp in the podcast on YouTube is 4:48.

[1] https://lexfridman.com/yann-lecun-3-transcript

[2] https://www.youtube.com/watch?v=5t1vTLU7s40

tomjakubowski|1 year ago

How is it determined that the optic nerve transmits 20 MB/sec of data?

HeyLaughingBoy|1 year ago

Is it? What if that 4-year-old child were blind? Obviously their concept of the physical world would be different, but is it any less accurate? If we remove the need for visual perception, thereby removing that bottleneck, how much faster would we be able to make progress?

apinstein|1 year ago

I think it would be significantly less accurate. Their error rates for performing physical tasks would be different b/w they lack the sensors to accurately train decent world models. For instance, I don't think they could catch a ball at the same skill level as a sighted child no matter how hard they tried.

So the lack of that sensor will cause the brain to develop poor representations of motion in 3d space.

How lack of those representations would affect other representations is less clear; because seeing the fusion between the LLM (which similarly doesn't have an embodied world model representation) and the robot AI (which presumable does) obviously works really well.

Now, it's possible that the 2 models are just inter-communicating between their own features (apple the concept and apple the image/object) and then being able to connect that together. The point of this meaning that there could be benefits from separate training and then post-training connection to bridge any gaps in learned representations.

However, I'd think that ultimately a model that can train simultaneously on more sensory input vs less will have a better/more efficient world model with more useful & interesting cross-connections between that space and applied uses in non-physical domains.

phlipski|1 year ago

So maybe we should start with building a "pinball wizard". A "deaf, dumb, blind" system that plays by sense of touch - or in this case some accelerometers and pressure transducers? radically reduced bandwidth inputs...

iambateman|1 year ago

To keep the analogy going, we should be concerned about unregulated companies creating robots with superhuman capabilities and a four-year-old’s sense of the world.

Regulators need to get ahead of this and establish a federal framework for safe robotic entrepreneurship.

For example…does the second amendment give me the right to have a drone which is capable of autonomously shooting a deer? There will be tens of millions of people who disagree on that point alone.

And then we need international agreements - much like nuclear - governing what is “fair game” for the public to have access to.

We must pursue a robot-enhanced future, carefully.

anon291|1 year ago

> For example…does the second amendment give me the right to have a drone which is capable of autonomously shooting a deer? There will be tens of millions of people who disagree on that point alone.

IANAL but it seems this would fall under running a human controlled robot with a gun, which I believe is illegal

gibsonf1|1 year ago

I would say conceptual awareness is a far bigger bottleneck than visual perception data.

ben_w|1 year ago

I think the state of the art is still bottlenecked on visual perception performance, even if there is sufficient data, and irregardless of any further questions about conceptual awareness.

If we could model visual streams accurately, fast, and at low compute cost, I think self-driving cars and autonomous mobile robots would be much more widely available.

TeMPOraL|1 year ago

LLMs arguably cracked conceptual awareness already, not to mention demonstrated that you can bootstrap it unsupervised by throwing enough data at it.

smokel|1 year ago

There are only a few thousand concepts worthy of thought, but there are way more potential pixels even in the current room that I am in.

Invictus0|1 year ago

1^15 bytes huh?