top | item 36184269

(no title)

ikhatri | 2 years ago

This is a compelling argument at the surface level (that roads are designed for humans with vision) that quickly breaks down when you examine how Tesla constructs their self-driving system.

Quick disclaimer that this doesn't reflect the views of my employer, nor does any of what I'm saying about self-driving software apply specifically to our system. Rather I am making broad generalizations about robotics systems in general, and about Tesla's system in particular based on their own Autonomy Day presentations.

When you drive on the road as a human, you rely a lot more on intuition and feel than exact measurements. This is exactly the opposite of how a self-driving car works. Modern robotics systems work by detecting every relevant actor in the scene (vehicles, cyclists, pedestrians etc.), measuring their exact size and velocity, predicting their future trajectories, and then making a centimeter level plan of where to move. And they do all of this 10s of times per second. It's this precision that we rely on when we make claims about how AVs are safer drivers than humans. To improve performance in a system like this, you need better more accurate measurements, better predictions and better plans. Every centimeter of accuracy is important.

By contrast, when you drive as a human it really is as simple as "images in, steering angle out". You just eyeball (pun intended) the rest. At no point in time can you look at the car in the lane next to you and tell its exact dimensions or velocity.

Now perhaps with millions of Nvidia A100s we could try to get to a system that's just "images in, steering angle out" but so far that has proven to be a pipe dream. The best research in the area doesn't even begin to approach the performance that we're able to get with our more classical robotics stack described above, and even Tesla isn't trying to end-to-end learn it all.

That isn't to say it's impossible (obviously, humans do it) but I think one could make a strong argument that "images in, steering angle out" is like epsilon close to just solving the problem of AGI, and perhaps even a million A100s wouldn't cut it ;)

discuss

sudosysgen|2 years ago

That's not really true. Humans, at critical moments, do make implicit and even explicit plans of movement and follow them. We don't use literal velocity measurements for other objects, true, but in making those plans we do sometimes anticipate their locations at various points in the future, which is really what matters.

The best human drivers do this not at centimeter, but at the millimeter level. Look as downhill (motor)bike racing, Formula 1, WRC, etc..., These drivers can execute millimeter level accuracy maneuveurs that are planned well in advance at over 100km/h.

ikhatri|2 years ago

Yeah that's kind of what I was trying to say. You're right in that we predict the actions of others, but we don't do it in the same way. Even when we execute millimeter level maneuvers, we aren't explicitly measuring anything... Like if you were to ask a driver for instructions on how to repeat that maneuver they wouldn't be able to tell you, they just have a "feel" for it.

Basically humans are really really good at guesstimating with great accuracy (but poor reproducibility) and since we don't use basic measurements in the first place, having better measurement accuracy wouldn't really help us be better drivers on average (it does help for certain scenarios like parking though, where knowing the # of inches remaining to an obstacle can be very useful).

But for everyday driving at speed, we wouldn't even be able to process measurements in real time even if someone was providing them to us. AVs are different and that's basically the gist of what I was trying to say. Because they actually do use, rely on, and process measurements in real time, improving their measurement accuracy (ie. switching from camera based approximate depth, to cm level accurate depth from a LiDAR) can have a meaningful impact on the final performance of the system.