(no title)
tgog
|
6 years ago
This completely neglects the fact that humans can build near perfect 3D representations of the world with 2D images stitched together with the parallax neural nets in our brain. This blogpost briefly mentions it in one line as a throwaway and says you'd need extremely high resolution cameras?? Doesn't make sense at all. Two cameras of any resolution spaced a regular distance apart should be able to build a better parallax 3D model than any one camera alone.
sairahul82|6 years ago
Humans do lot more than just identifying an image or doing 3d reconstruction. We have context about the roads, we constantly predict the movement of other cars, we do know how to react based on the situation and most importantly we are not fooled by simple image occlusions. Essentially we have a gigantic correlation engine that takes decision based on comprehending different things happening on the road.
The AI algorithms we teach does not work in the same way as we do. They overly depend on the identifying the image. Lidar provides another signal to the system. It provides redundancy and allows the system to take the right decision. Take the above linked image for an example.
We may not need a lidar once the technology matures but at this stage it is a pretty important redundant system.
ebg13|6 years ago
That's not relevant when discussing which technology to use to build the 3d models. Everything you said is accurate until the last few sentences. Lidar provide the same information (line of sight depth) as stereo cameras, just in a different way. The person you're responding to is talking about depth from stereo, not cognition.
rkangel|6 years ago
I had always assumed that the first few years of infancy was effectively a period of training a neural net (the brain) against a continuous series of images (everything seen).
ricardobeat|6 years ago
qgadrian|6 years ago
Also provides a reliable source of data, if humans have a LiDAR in their system then we would use it to improve our decisions.
I don’t see why we should limit the AV.
Complexicate|6 years ago
Easy examples of this are optical illusions, ghosts, and ufos. There is also "selective attention tests" where a majority of people miss glaringly obvious events right in front of them, when they're focusing on something else. Regular people also tend to bump into things, spill things, and trip, even when going 3 miles an hour (walking speed).
taneq|6 years ago
rdtsc|6 years ago
So it seems that a truly accurate 3D representations of the world are not necessary, at least for driving. Perhaps it's the resolution? Looking at the samples in the article, they are just terribly fuzzy, with a narrow field of view. If I had to drive and only see the world through that kind of view, I don't think I would be doing very well.
m3at|6 years ago
We learn objects representations by interacting with them over years in a multi modal fashion. Take for example a simple drinking glass: we know its material properties (it is transparent, solid, can hold liquids), its typical position (stay on a tabletop, upright with the open side on top), its usage (grab it with a hand and bring to mouth)...
We also make heavy use of the time dimension, as over a few seconds we see the same objects from different view points and possibly in different states.
Only after learning what a glass is can we easily recover its properties on a still 2D image.
So at least for learning (might be skippable at inference), it makes a lot of sense to me to have more than 2D still images.
ebg13|6 years ago
joshvm|6 years ago
> Two cameras of any resolution spaced a regular distance apart should be able to build a better parallax 3D model than any one camera alone.
This is true if the platform isn't moving.
If you have the time dimension and you have good knowledge of motion between frames (difficult), you can use the two views as a virtual stereo pair. This is called monocular visual/inertial-SLAM. You can supplement with GPS, 2D lidar, odometry and IMU to probabalistically fuse everything together. There have been some nice results published over the years.
But in general yes, you'll always be better off if you have a proper stereo pair with a camera either side of the car.
microcolonel|6 years ago
The idea that the human brain has a "near perfect" 3D representation of one's surroundings seems inaccurate to me. There's a difference between near perfection and good enough that people don't often get hurt, when all of their surroundings are deliberately constructed to limit exposure to danger.
LeifCarrotson|6 years ago
And it is indeed an impressive and heroic piece of work when you can fix sensor problems with clever filtering, or fix mechanical problems with clever control algorithms. But when designing new equipment or deciding a path to fix a bad design, you never want to hamstring yourself from the start with poor quality input data and output actuators. That approach only leads to pain.
Once you have lots of experience with a particular design - dozens of similar machines running successfully in production for years - then you can start looking for ways to be clever and improve performance over the default or save a little money.
I understand Elon's desire to get lots of data. But there will be a much greater chance of success if it starts with Lidar + cameras, and a decade down the road you can work on camera-only controls and compare what they calculated and would have done to what the Lidar measured and the car actually responded. Only when these are sufficiently close should you phase out the Lidar.
Remember, you're comparing bad input data going to the best neural net known in the universe (the human brain) with millenia of evolution and decades of training data to sensor inputs to brand new programming. Help out the computer with better input data.
Symmetry|6 years ago
The other thing is that we, ideally, want a computer to drive a car better than a human can. There's a lot to be gained from having precise rather than approximate notions or other objects' distances and speeds in terms of driving both safely and efficiently. Now, Tesla has also got that Radar which when fused with visual data will help somewhat but I'm not sure how far that can get them.
KaiserPro|6 years ago
but it takes at least 10 years to train.
But most of the time we are not building a 3d map from points. we are building it from object inference.
There are many advantages that we have over machines:
o The eye seens much beter in the dark o It has a massive dynamic range, allowing us to see both light and dark things o it moves to where the threat is o if it's occluded it can move to get a better image o it has a massive database of objects in context o each object has a mass, dimension, speed and location it should be seen in
None of those are 3d maps, they are all inference, where one can derrive the threat/advantage based on history.
We can't make machines do that yet.
you are correct that two cameras allows for better 3d pointcloud making in some situations. but a moving single camera is better than a static multiview camera.
however even then the 3d map isn't all that great, and has a massive latency compared to lidar.
jsharf|6 years ago
mbrumlow|6 years ago
I have thought about this many times and often wondered why when closing one eye I am still able to function.
Sense then I have thought strongly that having depth perception is used for training some other part of our brain, and then only used to increase accuracy of our perception of reality.
Further proof of this is TV. Even on varying sized screens humans tend to do well figuring out the actual size of things displayed.
xiphias2|6 years ago
Driving back home with 1 eye was scary even though I was going much slower. It is possible to drive with 1 eye, but much much harder than with 2 eyes.
pazimzadeh|6 years ago
https://en.wikipedia.org/wiki/Depth_perception#Theories_of_e...
unknown|6 years ago
[deleted]
mcqueenjordan|6 years ago
adrianmonk|6 years ago
https://en.wikipedia.org/wiki/Depth_perception
This seems like a bit of a double-edged sword. On the one hand, it means there's more than one way to achieve a 3D model of the world with cameras. On the other hand, it means that if what machines can do with cameras is going to match what we humans can do with our eyes, they will need to either advance along 18 different fronts or take some of those cues further than we can.
Fricken|6 years ago
Otherwise we'll just have to figure out how to build autonomous vehicles with the technology we have, which is pretty crappy in comparison to biology in a lot of ways still.
nguoi|6 years ago
asdf21|6 years ago
unknown|6 years ago
[deleted]
mantap|6 years ago
With cameras and computer vision there's no way to prove it. There is always a chance that it will glitch out for a second and kill someone.
pfundstein|6 years ago
threeseed|6 years ago
This is ridiculous.
I am sitting in front of a monitor right now. Please explain how I can perfectly determine the depth of it even though I can't see behind it ? I can move my ahead all around it to capture hundreds of different viewpoints but a car can't do that.
ebg13|6 years ago
aeternus|6 years ago
davidgould|6 years ago
sdenton4|6 years ago