How many sensors for autonomous driving?

[+] ikhatri|2 years ago|reply

As someone who works in the industry (disclaimer these are my own views and don't reflect those of my employer), something about the framing of this article rubs me the wrong way despite the fact that it's mostly on point. Yes it is true that different companies are choosing different sensing solutions based on cost and the ODD in which they must operate. But I think this last sentence just left a sour taste in my mouth "But the verdict is still out as to which is safer".

It is not an open question and I hate it when writers frame it this way. Camera only (specifically, monocular camera only) systems literally cannot be safer than ones with sensor fusion right now. This may change in the future at some point, but it's not a question right now it is a fact.

Setting aside comparisons to humans for a second (will get back to this), monocular cameras can only provide relative depth. You can guess the absolute depth with your neural net but the estimates are pretty garbage. Unfortunately, robots can't work/plan with this input. The way any typical robotics stack works is that it relies on an absolute/measured understanding of the world in order to make its plans.

That isn't to say that one day with sufficiently powerful ML and better representations we would be totally unable to use mono (relative) depth. People argue that humans don't really use our stereoscopic depth past ~10m or so and that's a fair point. But we also don't plan the way robots do. We don't require accurate measurements of distance and size. When you're squeezing your car into a parking spot you don't measure your car and then measure the spot to know if it'll fit. You just know. You just do it. And it's a guesstimate (so sometimes humans make mistakes and we hit stuff). Robots don't work this way (for now), so their sensors cannot work this way either (for now).

[+] ActorNightly|2 years ago|reply

Self driving isn't a sensor problem, its a software problem.

From how humans drive, its pretty clear that there exists some latent space representation of immediate surroundings inside our brains that doesn't require a lot of data. If you had a driving sim wheel and 4 monitors for each direction + 3 smaller ones for rear view mirror, connected to a real world car with sufficiently high definition cameras, you could probably drive the car remotely as well as you could in real life, all because the images would map to the same latent space.

But the advantage that humans have is that we have an innate understanding of basic physics from experience in interacting with the world, which we can deduce from something simple as a 2d representation, and that is very much a big part of that latent space. You wouldn't be able to drive a car if you didn't have some "understanding" of things like velocity, acceleration, object collision, e.t.c

So my bet is that just like with LLMs, there will be research published at some point that given certain frames in a video, it will be able to extrapolate the physical interactions that will occur, including things like collision, relative distances, and so on. Once that is in place, self driving systems will get MASSIVELY better.

[+] hedora|2 years ago|reply

Monocular cameras are a strange strawman. Is anyone seriously considering them?

Binocular cameras provide absolute depth information, and are an order of magnitude cheaper sensors than the other options.

Since this technology is clearly computationally limited, you should subtract the budget for the sensors from the budget for the computation.

According to the article, the non-camera sensors are in the $1000’s per car range, so the question becomes whether a camera system with an extra $2000 of custom asic / gpu / tpu compute is safer than a computationally-lighter system with a higher bandwidth sensor feed.

I’m guessing camera systems will be the safest economically-viable option, at least until the compute price drops to under a few hundred dollars.

So, assuming multi-camera setups really are first to market, the question then is whether the exotic sensors will ever be able to justify their cost (vs the safety win from adding more cameras and making the computer smarter).

[+] andix|2 years ago|reply

As far as I know humans can also safely drive only with one eye. It’s perfectly legal in most countries.

But I agree that current software (Tesla?) is not able to do that in the same way. So it may need more sensors until the software gets better.

In theory cameras should also be able to see more than humans. They can have a wider angle, higher contrast, higher resolution and better low-light vision than the human eye.

[+] rbanffy|2 years ago|reply

Minor nitpick

> monocular cameras can only provide relative depth

While the environment awareness is nowhere near as good as two or more cameras would be, if you consider the output over time, you get valuable information about the change rate of the environment, i.e. how fast that big thing is getting bigger, which may indicate one should actuate the brakes.

Of course, I'm with the crowd that answers the question with a "how many can we have?" question. The more, the merrier. And the more types, the better - give me polarized light and lidar, sonar, radar, thermal, and whatever else that can be plugged in the car's brain to make it better aware of what happens (and correctly guess what's going to happen) outside it.

[+] wilg|2 years ago|reply

Can you elaborate on your reasoning? I’m shaky on some of logic here.

“Monocular” cameras > no “absolute” depth > less safe

The last leap is not well justified.

Also, cars with vision based driving have multiple cameras. Whats the difference between a “binocular” camera and two “monocular” cameras?

How does a “binocular” camera get better depth information?

Is using multiple cameras to drive sensor fusion?

Why is absolute depth a strict safety win? How do you know how the sensor details translate to the final safety of the full system?

If this is just a handwavey upper bound on safety, how do you know that such a system can’t be safe enough for its design goals?

If humans with only one eye are able to drive, why wouldn’t mono surround vision be at least as good as that?

[+] shadowgovt|2 years ago|reply

"But humans can do it with one eye closed and"

... and I want to grab the guy who says that by the collar and scream in their face "The whole point is to build something that can do better than a human."

[+] pja|2 years ago|reply

> People argue that humans don't really use our stereoscopic depth past ~10m or so and that's a fair point.

The second paper I reference in this comment https://news.ycombinator.com/edit?id=36232198 claims that humans can maintain stereopsis out to 250m.

That’s a huge difference from 10m & if true suggests that human drivers might well use 3D vision when driving.

[+] green_man_lives|2 years ago|reply

I am implementing Monocular vSLAM as a side project right now. I am working with some optimization libraries like GTSAM but having some issues. Do you know any good resources for troubleshooting this kind of stuff?

It's pretty easy to see, even as someone with very little experience, the benefits of stereo vision over monocular. In addition to the depth stuff it's a lot easier/faster to create your point clouds from disparity maps.

[+] GloriousKoji|2 years ago|reply

Adding in the car speed and direction information to the monocular camera images gets you an absolute/measured understanding of the world.

[+] cypress66|2 years ago|reply

> You can guess the absolute depth with your neural net but the estimates are pretty garbage.

I'm not sure what kind of systems you're referring to with "monocular cameras", but if you look at the visualization in a Tesla with FSD Beta, it's actually really good at detecting the position of everything. And that's with pretty bad cameras and not a lot of compute.

Only rarely you'll see Tesla's FSD mess up because of perception, the vast majority of times they mess up is just the software being dumb with planning.

[+] williamcotton|2 years ago|reply

Let’s say you are driving down the street in a suburban neighborhood. You see a kid throw a ball into the street. You see from how his body moved that it is a lightweight ball and that it doesn’t require drastic (or any) measures to avoid. Or you see that it is a very heavy object and requires evasive maneuvers.

How exactly does a certain type of sensor help with this? Isn’t the problem entirely based on a software model of the world?

[+] BulgarianIdiot|2 years ago|reply

> Setting aside comparisons to humans for a second (will get back to this), monocular cameras can only provide relative depth. You can guess the absolute depth with your neural net but the estimates are pretty garbage.

Stereoscopic vision in humans only works for nearby objects. The divergence for far away objects is not sufficient for this. You may think you can tell something is 50 or 55 meters away through stereoscopic vision, but you can't. That's your brain estimating based on effectively a single image.

That said reality is not a single image, it's a moving image, a video. Monocular video can still be used to estimate object distance in motion.

Eventually AI will be good enough to work better than humans with just a camera. The problem is we're not there yet, and what Tesla is doing is irresponsible. They should've added LIDAR and used that to train their camera-only models, until they're ready to take over.

[+] samwillis|2 years ago|reply

The new Apple Vision Pro arguably has a better set of cameras, sensors and signal processing than a Tesla... not sure what to take from that. Makes me think what an Apple Car would be like.

My argument on autonomous driving is that it can't just match the safety of the average human driver. It needs to be 100-1000x better, unquestionably better for everyone, always. Until that happens it's a dead end. I think that's probably only achievable with LIDAR, which will come down in price as volumes ramp up.

The thing that surprises me is that Tesla has invested so much into various hardware technology (even their own silicon), but completely ignored LIDAR. With their resources, volume and investment, they could have reduced the cost to a tenth of what it is now. They go for big challenges, but decided not to try on that one.

[+] rootusrootus|2 years ago|reply

> The new Apple Vision Pro arguably has a better set of cameras, sensors and signal processing than a Tesla...

IIRC a number of cars, even Subaru, have better cameras than a Tesla. Really, the cameras Tesla included on the Model 3/Y are fairly mediocre off-the-shelf hardware from years ago. But they do have more of them pointed in different directions, so there's that.

[+] fatnoah|2 years ago|reply

> The new Apple Vision Pro arguably has a better set of cameras, sensors and signal processing than a Tesla

I'm sure it does, but that's also because you need much greater degrees of precision and accuracy to anchor virtual content in the real world. Inches and centimeters matter here, whereas an autonomous vehicle can get away with coarser measurement.

[+] ajross|2 years ago|reply

> The new Apple Vision Pro arguably has a better set of cameras, sensors and signal processing than a Tesla

And it costs about 9% of what a new Model 3 would run you. It's also coming to market almost 5 years after the Tesla HW3 stack that runs almost all cars on the roads to which you're making the comparison.

> My argument on autonomous driving is that it can't just match the safety of the average human driver. It needs to be 100-1000x better, unquestionably better for everyone, always.

That sounds like maybe a marketing argument? You're saying that if it was only 2x better at some visceral metric (total traffic deaths, say) that you think people wouldn't accept it and it would be regulated out of existence, either by liability concerns or actual lawmaking?

The counter argument is that those metrics are, after all, visceral. The US has 40k traffic fatalities per year. You really think the government is (or courts are) going to kill a technology that could save 20k people every year?

No. "Merely better" is the standard that will stick.

> I think that's probably only achievable with LIDAR

Everyone on your side of the argument does, but frankly this has become an argument of faith and not substance. LIDAR capability has become a proxy for "not Tesla", so people line up along their brand loyalties. For myself, LIDAR is actually the technology looking like it isn't panning out. It's not getting cheaper or more reliable. LIDAR autonomy isn't getting better nearly as fast as vision solutions are iterating.

Most importantly, LIDAR at its best still only gives you the 3D shape of the world, and that's not enough. LIDAR doesn't help with lane selection, it doesn't help with traffic behavior modelling, it doesn't help with sign reading. It doesn't help with merge strategies. All that stuff has to be done with vision anyway.

The "don't hit stuff" bar has, quite frankly, already been crossed. Tesla's don't hit stuff either. If you give them LIDAR, they.... still won't hit stuff. But all the work, even for LIDAR vehicles, is on the camera side.

[+] hutzlibu|2 years ago|reply

"Makes me think what an Apple Car would be like."

Probably very expensive.

But wasn't there some rumor already some time ago, that apple wanted to make a car?

But cars are a different ballpark, than consumer electronics. So they are likely cautious to not burn their money there.

[+] tester756|2 years ago|reply

I'm not a fan of Apple, but I tend to say "if Apple did X then it'd be more userfriendly"

and I'd want to see if cars+roads+whole infrastructure designed from scratch by Apple

[+] fabbari|2 years ago|reply

When I saw the R1 chip in there and the functions it has I immediately thought: 'This is a chip they are using for sensor fusion in their self-driving moonshot, it just happens to also work for the AR headset'.

[+] sunflowerfly|2 years ago|reply

They have also built two real time operating sytems, one for their car project and one for the Vision Pro. I was curious how much code they shared?

[+] lawn|2 years ago|reply

It says more about Tesla than it does about Apple.

[+] Animats|2 years ago|reply

From my DARPA Grand Challenge days, a few comments:

You want a high-resolution LIDAR. The trouble is, they cost too much. That's due to the low volume, not the technology. A few years back, there were several LIDAR startups using various approaches. There were the rotating machinery guys, the flash LIDAR guys, the MEMS mirror guys, and some exotics. I was expecting the flash LIDAR guys to win out, because that approach has no moving parts. Continental, the big auto parts company, acquired Advanced Scientific Concepts, a LIDAR startup, and demoed a nice little unit. But nobody wanted enough of them to justify setting up full scale manufacturing. ASC sells some space-qualified units, and the Dragon spacecraft uses them for docking. Works fine, costs too much.

Another of the startups, Luminar, made its founder a billionaire without the company having shipped much product. "Luminar has not generated positive cash flows from operating activities and has an accumulated deficit of $1.3 billion as of December 31, 2022." - Wikipedia. They've announced many partnerships and have been at this since 2012, but you still can't order a LIDAR from their web site.

Velodyne makes those spinning things Waymo uses. They had the first automotive LIDAR, a big car-top spinning wheel, at the DARPA Grand Challenge. It fell off the vehicle. The technology has improved since, but it's still expensive. Velodyne just merged with Ouster, which sold similar spinners. Ouster's web site has a "Order Now" button, but it leads to an email onboarding form, not a price list.

Many of these things need custom indium gallium arsenide detector chips, which would be far cheaper if the market was 100,000 a month instead of 100 a month. This technology hasn't scaled up in volume yet. That's one thing holding this back.

If you want coverage over a full circle, you either need one of those silly looking domes on top of the car roof, or multiple units, each with a narrower field of vision. Waymo has both. Works fine, costs too much.

On the radar side, there's been improvement in both processing and resolution. Not as much as expected over 20 years. Automotive radar has moved up from 24 GhZ to 77 GhZ, so resolution can now potentially be sub-centimeter. True millimeter radar (300GhZ and up) has been demoed [1] but not deployed beyond lab systems. Once you get up there, it's almost as good as LIDAR. Humans show up in detail. That's going to be a useful technology once it gets into commercial products.

So the sensor situation is pretty good, but expensive because it can't mooch off some other high volume technology such as cell phones.

[1] https://www.fhr.fraunhofer.de/en/the-institute/core-competen...

[+] mjh2539|2 years ago|reply

You'd think that governments would have some stake in providing infrastructure specifically for this. Underground magnets and radios along important roadways...that kind of thing.

What I'm getting at is, there's no reason to assume that the current amount of information presented on a roadway (stripes, lights, stopsigns, etc.) is sufficient for autonomous driving. Or maybe it is, but adding a few more inexpensive things would make it that much easier to achieve.

[+] jjoonathan|2 years ago|reply

When the road quality, paint, and signage is decent, modern self driving stacks really don't have any problem seeing it and using it. Effort should be focused on improving those so that everyone benefits.

If Tesla (or any self driving player, but the Tesla FSD stack drives a million miles/day across the US right now) published a road quality score, home buyers would quickly start paying attention and then home owners would quickly start paying attention. This would push forward the political will to fix the roads, at least in places where low road quality is driven by budgeting problems rather than outright poverty.

[+] ethanbond|2 years ago|reply

Even "inexpensive" things spread over millions of miles of roadway gets pretty expensive. Let's start with the basics of reliable painting and safe road surface.

[+] whowe444|2 years ago|reply

One reason to assume that the current amount of information presented on a roadway is sufficient is that humans can learn to drive in a matter of hours with only two vision sensors mounted on their heads.

[+] hn_throwaway_99|2 years ago|reply

This would never be feasible. To start, forget underground magnets and radios. If you just started with properly painted lines, always-functioning traffic signals, and correct street signs, autonomous driving would be much, much easier than it is now.

But the vast majority of the difficulty in autonomous driving is simply dealing with the messiness of the real world. For example, many of the most serious and notable autonomous driving crashes have been the result of faulty road conditions (e.g. https://www.theverge.com/2020/2/25/21153320/tesla-autopilot-... - in that case, the driver wasn't paying attention, but the fact that lane markers were worn down is what caused Autopilot to lose lane tracking, and the fact that crash attenuator in front of the barrier was damaged and not replaced contributed to the driver's death).

Government can't even keep decent lane lines on the roads, and thus I think an autonomous driving system that had to depend on magnets or radios (that failed if those were out) would be even more of a non-starter.

[+] spookie|2 years ago|reply

I'm not sure if this would help. It might be somewhat inexpensive, but at scale? I don't know.

Most roads aren't even properly painted ._.

[+] foobiekr|2 years ago|reply

The problem space in self driving is the set of unexpected and uncommon events. Pretty much any solution can do lanekeeping fine.

[+] MetaWhirledPeas|2 years ago|reply

It will need to be visual, so that people can immediately recognize compliance, in addition to machines.

[+] f001|2 years ago|reply

Note: Tesla has been rumoured to re-introduce a higher-res radar module soon.

(https://www.teslarati.com/tesla-hardware-4-hd-radar-first-lo...)

[+] modeless|2 years ago|reply

It's not a rumor exactly. The hardware is physically present in new Model S/X cars. But it's not used by the software AFAIK (yet?).

[+] dr-detroit|2 years ago|reply

[deleted]

[+] FreshStart|2 years ago|reply

Reducing range by x kw used per hour?

Example uses 64 kWh if constantly used.

https://www.proxyparts.com/car-parts-stock/result/part-numbe...

[+] this_steve_j|2 years ago|reply

Not to put too blunt a point on this metaphor, but a human’s eyes and visual cortex are not stereo cameras and the human brain is not a GPU/CPU despite some apparent similarities.

Nor is “professional commercial driver” a job that could be performed by a Neanderthal or a 6 year old student driver of today despite nearly identical hardware and software. I would never ride across Manhattan every day in a car driven by the latter two humanoids and I doubt most anyone would.

Ideally these systems should only be certified after performing demonstrably better than a teenage student driver given a series of the most complex, adversarial driving workloads in deteriorating conditions with disabled and malfunctioning sensors.

On the other hand, today’s billionaires may actually be close to achieving fully autonomous and acceptably failsafe non-human driving on today’s roads, and they might also be exploring Mars within my lifetime. Personally I wouldn’t bet my life on it.

[+] belter|2 years ago|reply

These type of questions seem to show a misunderstanding of the nature of the problem being solved. Would anybody ask "How many eyes do you need to drive a car?".

Self driving is a difficult problem because you need to:

- Decipher the nature of the objects in the 3d space you are moving on.

- Predict how their physical relationship to your current position will change according to current timeline and intentions of these objects.

- Explore the plane of possible deviations for each of these objects and be able to react accordingly.

[+] chickenpotpie|2 years ago|reply

> Would anybody ask "How many eyes do you need to drive a car?".

I think this is an oversimplification. People don't ask that question because humans have many many more senses they drive with than just our eyes. Humans even have accelerometers. If someone said "I'm deaf, blind in one eye, and can't sense acceleration" people would probably ask if they are fit to drive. Just being deaf is enough to start that conversation. The AARP even mentions that if someone is deaf, they should seriously consider not driving. The conversation of sensor fusion is important because humans require sensor fusion to be effective drivers too.

https://www.aarp.org/auto/driver-safety/driving-with-hearing...

[+] motohagiography|2 years ago|reply

Autonomous flying seems like a much easier and more cost effective problem to solve, given there are orders of magnitude fewer contexts and obstacles.

The smarter problem would be changing the controls/UX for cars to be more like a hoverboard or balance/attention and gesture based than twisting a wheel around, abstracting that to remote telemetry, and then work on using ML to replacing that remote driver.

Autonomous vehicle companies have picked the unsolvable moonshot problem instead of improving the controls UX of driving to where it provides greater leverage to human ability. Make the driving experience more like a motorcycle, which is an extension of your body, instead of a mediated service where driving is reduced to a transaction. Self driving cars is the perfect example of sprawling corporate groupthink, where someone has a poorly formed idea and everyone aligns around managing a response to it instead of asking whether it's the right problem to solve. For example, I think the Apple version of a car wouldn't be self driving, it would be one that feels like it's controlled by thought.

Maybe that's Musk's Neuralink play, and self driving was a distraction to set all auto industry competitors on a goose chase. If he's not that strategic, it could at least provide an out, where a gesture controlled Tesla would outperform.

[+] pizzalife|2 years ago|reply

> The smarter problem would be changing the controls/UX for cars to be more like a hoverboard or balance/attention and gesture based than twisting a wheel around, abstracting that to remote telemetry, and then work on using ML to replacing that remote driver.

I'm sorry, what? The difficulty of self-driving cars is not about controlling the actual vehicle in space, it's about the other dynamic agents involved and getting an accurate view of the physical world.

The last line of your comment is just pure nonsense. Tesla's failed self-driving was a 5d chess move by Musk? Come on.

[+] photochemsyn|2 years ago|reply

There's a very-limited-scale open-source dataset available that would allow some degree of independent assessment:

https://techcrunch.com/2020/05/22/scale-ai-releases-free-lid...

I imagine 99% of the collected data is proprietary and hidden from public view, so it becomes a question of taking corporate claims at face value, which has never been that great of an idea.

Case in point: it's hard to imagine that electric semis hauling goods hundreds of miles can do so safely without human operators involved at some stages on the route. Hence, even with decent autonomous systems having a (paid!) driver in the truck at all times is the rationale thing to do - and yes, that means labor costs in that sector won't be changed much by self-driving AI technology. It will likely increase safety, as the trucks can drive themselves autonomously on the open freeway more safely than a (tired) human driver can.

[+] intrasight|2 years ago|reply

The human operators most certainly won't be inside the semi. They will be sitting in a control room wearing an Apple Vision.

[+] Animats|2 years ago|reply

Clickbait article copied the table without copying the table heading that identifies which column is camera, LIDAR, and RADAR. Fail.

[+] giantrobot|2 years ago|reply

The images at the top are labeled, poorly. It took me some zooming in to be able to read them. But agreed the lack of table headers was just plain stupid.

[+] Havoc|2 years ago|reply

Always found this a very strange question.

Sure LIDAR etc is expensive, but so maybe not that one, but in general I would have thought that many cheap & cheerful sensors is the way to go especially if you're going to be feeding it into a neural net anyway that is well suited to unscrambling that signal spaghetti

[+] inasio|2 years ago|reply

An article that blew my mind a few months ago describing the autonomous sensor race in China [1]

[1] https://www.theautopian.com/the-advanced-driving-assist-syst...

[+] whatever1|2 years ago|reply

And not even one accelerometer. Will definitely work safely /s.

[+] krisoft|2 years ago|reply

I'm almost certain that nearly everyone uses accelerometers, gyroscopes and wheel odometry data from the base vehicle. These sensors are dirt cheap and kinda hidden inside the vehicles so people don't write articles about them. That doesn't mean that engineers forgot about them, they are just not what makes a journalist's heart pump faster.

[+] AlotOfReading|2 years ago|reply

The article doesn't use a consistent definition of sensors. Notice how Mercedes' number is reported to include their emergency microphone as a sensor, but Tesla's number only includes cameras + ultrasonics. Ultrasonics are counted, but not the IMUs that may also be involved in detecting collisions.

If you were to truly count every sensor connected to every chip, there'd be hundreds or thousands of sensors. Only some of them are used. Different vehicles in a testing fleet will have slightly different numbers as well, due to the various component changes that are in-flight at any given time.

[+] magnat|2 years ago|reply

What kind of useful data would accelerometer provide? 99.999% of the time wheel rotation rate matches exactly car movement speed and direction, so as long as you have odometer (resolver/encoder), you have accelerometer data for free. For the cases when wheels are spinning freely, you have a serious situation which needs to be dealt with using specialized safety equipment.

212 comments