top | item 27580744

Andrej Karpathy (Tesla): CVPR 2021 Workshop on Autonomous Vehicles [video]

189 points| vpj | 4 years ago |youtube.com

199 comments

[+] nightski|4 years ago|reply

It's interesting that an academic conference now feels like a marketing op for industrial research labs more than anything. His claims about how accurate their vision system is and how it is exceeding other sensors is not verifiable in any way to the public. Given how well qualified he is I am sure he is not wrong! Andrej is brilliant. But this is an academic conference right? This isn't open science, it's a discussion about an engineered system. I'm afraid this is the future of ML research (which CV is so heavily dependent on now). Long gone are the days of reading a paper and understanding the approach. Now you need the data and model which may not even be computationally feasible without millions of dollars in hardware. This isn't Tesla's fault or anything, it just makes me sad.

[+] aeternum|4 years ago|reply

In the talk, he gave clear examples with detailed position + velocity graphs where the vision system detected obstacles sooner and with less jitter than the radar system. Specifically the overpass where radar triggers erroneous braking, and the pulled over truck where radar detects the obstacle significantly slower.

[+] ma2rten|4 years ago|reply

The talk wasn't part of the main conference but it was part of a workshop, which was just one out of tons of workshops.

http://cvpr2021.thecvf.com/workshops-schedule

[+] usmannk|4 years ago|reply

From what I have been able to tell recently, CVPR in particular is a venue where "engineered systems" get a lot of focus. I don't think this is necessarily a bad thing, nor do I think it is representative of the big ML conferences in general.

[+] Robotbeat|4 years ago|reply

He made a very good argument for vision-only, but it seems like training actually uses radar data to help calibrate vision measurements, so it seems to me there’s value in making some vehicles still contain radar (say, one out of 10) even if it’s not used for controlling the vehicle directly at drive time.

Also, the sensor resolution issue he mentioned could be addressed by using a higher resolution radar sensor.

I find the list of 221 triggers to be interesting. In principle, the NHTSA or NTSB could help contribute lists of triggers to companies to validate their training sets on.

Every time there is a fatal airliner accident, the NTSB does a safety investigation and airliners get a little bit safer each time. In the same way, each fatal accident in a vehicle with this kind of autonomy could end up being captured by these triggers, improving safety over time in a sort of mixture between expert human analysis and ML.

(Nobody does this for all regular car crashes because fatal car crashes happen every day! And you’re not going to retrain human drivers about some new edge case every day, although you can for vehicles like this.)

[+] avs733|4 years ago|reply

Most fatal car crashes are investigated, but by the police not engineering experts. The invetigational motivation is legal and liability focused not improvement focused.

You sparked a happy delusion in my mind...training drivers to the same level we train pilots. Can you imagine drivers having regular check rides?

[+] nelsonfavedra|4 years ago|reply

> but it seems like training actually uses radar data to help calibrate vision

They seem to use radar solely to automatically label data for training.

In the given example though where according to the vision system smog interrupted the persistence of the label for the leading car, I wonder if the use of radar data to persist the label is strictly necessary.

A car disappeared then reappeared in the data, why not just tween the bounding box over time and assume the car had always been there, like, if it looked the same when it reappears or something. An extra sensor just for labelling data seems silly.

[+] osipov|4 years ago|reply

[deleted]

[+] bluepanda928752|4 years ago|reply

Tesla's decision not to use the LIDAR as a safety feature (i.e. having reliable high-resolution data about things the car can collide with) is so incredibly indefensible, since solving the last 1% of this using only vision likely requires a general artificial intelligence

Prediction: Tesla will be the last of all major auto manufacturers to get to L5 autonomy. Time interval between when Tesla L5 FSD is finally available and when humanity is destroyed by the general AI it runs on will be very awesome and also very short

[+] liuliu|4 years ago|reply

You get sparse point cloud from LIDAR sensors, not accurate 3D maps. This is the main reason why some people think LIDAR may not work well (mostly, only comma.ai and Tesla folks).

Vision can also get you 3D maps, either in active manner (IR floodlight or structured lighting) or not. I will reserve my judgement until see more from either side.

[+] SheinhardtWigCo|4 years ago|reply

They’re betting that they can use a massive feedback loop to train a set of neural networks to the point where they are as accurate as LiDAR without actually firing any lasers.

Even if you believe this goal is possible to achieve at some point in the future, I think the argument falls apart when you consider that it will take years, probably decades, for a pure vision approach to catch up to where Waymo is today in terms of safety. (They have cameras too.)

That Tesla can’t afford to fit expensive LiDAR sensors to all of the cars it sells is Tesla’s problem. Regulators won’t give a shit that pure vision is “better” in theory. They will simply compare Tesla’s crash rate in autonomous mode with that of Waymo and other AV operators, and act accordingly.

[+] tokipin|4 years ago|reply

I predict the opposite. Tesla sold half a million cars last year and will sell nearly one million this year. The data they have access to is increasing by orders of magnitude. I bet there is a point, let's say 20 million cars total, where they can pull so much high quality data that they will be able to surpass lidar capabilities for the purposes of self driving.

The lidar/no lidar discussion is a fun one because people have different ideas about how the world works. Personally I think LiDAR is the modern version of expert systems. It appeals to a logical/geometric intuition but the approach is brittle to real world contact, especially when paired with HD maps which are a great way to drive yourself into a local maximum.

[+] ra7|4 years ago|reply

> Prediction: Tesla will be the last of all major auto manufacturers to get to L5 autonomy.

Tesla is also the only company to claim to target L5 autonomy. Everyone else, including Waymo, is strictly targeting L4 and say L5 autonomy is not possible or realistic. L5 is a pipe dream.

[+] dheera|4 years ago|reply

My prediction is that Tesla will eventually use LIDAR despite whatever they are saying now.

Right now their profit model is selling cars, not autonomy, so everything is optimized for that, including the decision to not use LIDAR.

[+] cs702|4 years ago|reply

I strongly disagree. By all measures I've seen (including a couple of slides in the OP's video), Tesla's self-driving is far safer than human driving: the number of accidents and deaths per mile driven are something like an order of magnitude lower (i.e., around 10x safer). I mean, the machine never gets distracted, tired, sleepy, emotional, drunk, etc., so it is a LOT LESS likely to crash on boring, monotonous road segments than most people -- who do get distracted, tired, sleepy, etc. Not only that, but people make really scary mistakes in routine circumstances. The video shows several examples of human drivers hitting the accelerator when they actually meant to hit the brake!

The criticism of autopilot is really about it getting tripped-up in response to statistically rare, unusual circumstances, i.e., edge cases. Karpathy et al are working on getting better at those, bringing the rate of situations that surprise autopilot closer and closer to 0%, even if it can never be achieved -- there will be always be surprises. Personally, I would rather take a tiny risk of crash on rare, once-in-a-million-miles events with autopilot driving than a ~1% risk of crash per 1000 to 2000 miles with everyday human driving.

Prediction: Tesla will be the first of all major automakers to get to level 4 and 5 autonomy.

[+] likearocket|4 years ago|reply

No one is going to get to a true L5 for a long time. That is totally irrelevant. It's a war of attrition. Whoever can monetize L3/L4 and can scale without any vehicle upgrade cost is going to win. It's pretty obvious lidar is very very silly since it doesn't scale.

It is also pretty easy to see that Tesla doesn't have to hit L5 to have won autonomy. It just has to successfully monetize L3/L4.

[+] etrautmann|4 years ago|reply

The vision-only strategy only made a shred of sense before high performance / low cost LIDARs like the Ouster OS1 became available.

Now it’s an indefensible position on safety or economic grounds.

[+] kemiller|4 years ago|reply

Waymo etc do not use LIDAR for object sensing, only for positioning. LIDAR sucks for object sensing because it gives you no information about whether it's a plastic bag or a person — you still need vision for that. Even if you just err on the safe side and brake, that itself can cause an accident unecessarily.

[+] paxys|4 years ago|reply

> Prediction: Tesla will be the last of all major auto manufacturers to get to L5 autonomy

That doesn't mean much considering no company is getting to L5 autonomy likely in our lifetime and possibly beyond.

[+] soheil|4 years ago|reply

> is so incredibly indefensible

How can you make such a strong statement when you simply don't know how to achieve full autonomy? This reminds of the teapot orbiting the Sun argument [0]. You and people defending Lidar by their teeth don't sound too different from religious zealots who've "seen the light".

[0] https://en.wikipedia.org/wiki/Russell%27s_teapot

[+] vgchh|4 years ago|reply

Ultimately the proof is in the pudding. With Tesla FSD you can drive the highways from New York to Boston without any issues. I am sure there are many more routes across the country that can also be driven like that. Definitely not L5, but it works. Yet to see that from any other automaker, LIDAR or not. As far as I am concerned, great job Tesla! Keep it up, I am sure you will work through more tougher problems.

[+] codeulike|4 years ago|reply

Innovation is a gamble. You're not wrong to point out they might fail. The likeliness of failure is what makes it worth the gamble of trying.

[+] _ea1k|4 years ago|reply

> solving the last 1% of this using only vision likely requires a general artificial intelligence

That is likely close to true with lidar as well. See also some of Waymo's recent struggles in unexpected construction zones.

Maybe lidar helps in getting there, but I'm afraid they all hit a pretty tough ceiling without this.

[+] ec109685|4 years ago|reply

Why do you think the last 1% is dependent on LIDAR versus any of the other multitude of gaps between today’s autonomy and L5?

If the only way it becomes practical to achieve L5 is to use LIDAR, Tesla can obviously add it. But if they waited until LIDAR was cheap and practical, they still wouldn’t be shipping any hardware doing autonomy today, and not collecting the data needed to train their models and delivering value today.

Also, with vision based systems, it operates in somewhat an intuitive fashion given we have eyes too.

[+] accurrent|4 years ago|reply

I see a lot of people here are stuck on the perception side of things. There's a lot more to self driving than just the sensor suite and perception. There's a lot of work that needs to be done in the planning and controls department prior to the time we get full vehicle autonomy. Andrej's work is impressive, but I wish we'd see more research into the latter. Then again this is CVPR so...

[+] babesh|4 years ago|reply

I think the more interesting question is how much human context is necessary in decreasing accident rates. The signal question and tunnel answer hinted at that. Some context is very local and some context is general at the level of humans.

Examples: human eyes will have trouble adjusting to the sudden darkness of tunnels so some people will tend to brake suddenly; that person looks old and will probably have slower reaction times so watch out for the upcoming sharp turn; that person looks like they are on their phone and may cross the lane suddenly; watch out for this intersection because young humans cross it after school without looking so slow down below the speed limit.

This human understanding doesn’t seem to be directly represented by the system without explicit architecting on their part. A more general intelligence would begin to automatically learn these. A human intelligence would automatically model these or learn from experience or read about it.

As mentioned, the current system has some advantages over humans: more eyes, doesn’t get tired or distracted, faster reaction time. I guess we shall see when these advantages cover up the disadvantages.

[+] fpgaminer|4 years ago|reply

I agree, though I think there's an obvious path towards that higher order understanding of the road that humans have.

Suppose they eventually have this current system dialed in and they get really good, accurate bounding boxes around all interesting objects on the road.

So now, in addition to their 10 second samples of video data they're collecting, they start collecting 10 second samples of scene representations.

These samples of scene representations are time series of how various objects in the scene are moving and behaving over time. Many examples of just what you describe: cars with older drivers having slower reaction times; cars with distracted drivers driving recklessly; etc.

Now you train a model on all that data, asking it to make predictions through time. It's going to quickly pick up on the same or similar cues that humans do. It sees an older person in a car and says that slower, cautious paths through the scene are more likely for that vehicle. They see a large, lifted truck and assume a 90% probability of a "cut off every car possible" path through the scene. Etc.

So I see what Karpathy is building now as a foundation upon which they can build the higher order stuff.

[+] sam_goody|4 years ago|reply

tldr: Tesla uses vision alone, and has dropped radar and the other sensor. He makes a very decent argument why.

(Surprisingly, he basically ignores night driving.)

[+] dogma1138|4 years ago|reply

Ironically this is what Tesla criticized Mobileye for.

I still think that this is far the best demonstration of autonomous driving to date https://youtu.be/A1qNdHPyHu4

[+] nickik|4 years ago|reply

The list of triggers contains things like 'motorcycles at night', so it seems its all in that dataset.

[+] 01100011|4 years ago|reply

Not a fan of TSLA, but isn't night driving just a special case of daytime driving if you use IR and/or hyperspectral cameras?

[+] childintime|4 years ago|reply

The video: https://youtu.be/NSDTZQdo6H8

[+] dang|4 years ago|reply

Thanks! Maybe it's best if we change the URL to that from https://twitter.com/vpj/status/1407000737423368197.

[+] nickik|4 years ago|reply

I resonantly had an argument on here where somebody insistent that breaking because of over-passes were issues with vision system. Seem pretty clear that it is the resolution of the radar, not the shadow of the bridge that causes the issue. Good to get some more insight into this.

This is the right thing to focus on, as it is by far the largest issue with Autopilot on the highway. Multiple people who do testing of these system that false positives on some highway overpasses are the biggest usability issue.

[+] unknown|4 years ago|reply

[deleted]

[+] aiddun|4 years ago|reply

Per "it's unscalable to get HD 3D maps of all the roads on earth", it's interesting to consider that Google/Waymo has been growing this for years with street view and the sensors on each car. Curious to see how that plays out

[+] villgax|4 years ago|reply

Even if FSD takes longer, I sure am glad about the active safety features to prevent dumb incidents. Hope that trickles to every other manufacturer petrol/electric

[+] rpmisms|4 years ago|reply

There's clip after clip of AP yanking people out of situations where they had no idea they were in danger. Wife and I are planning on kids soon, and we won't consider anything except a Tesla due to those safety features.

[+] aimkey|4 years ago|reply

What an annoying charlatan. Karpathy is a brilliant computer vision engineer, but he has let his expertise in that subfield cloud his judgement on achieving the overall goal of autonomous driving.

Musk and Karpathy have been dead wrong about LIDAR for years. Remember Musk making the absurd claim of a million Tesla robotaxis by 2020? I think most hilarious is that both Karpathy and Musk claim the LIDAR systems are too expensive. Yet, in the same 2019 Autonomy Day they simultaneously claimed that Teslas would be able to drive themselves and operate as robotaxis, earning their owners passive income and therefore justifying significantly increased MSRPs. So, the $7k LIDAR system (that accelerates safe autonomous driving) is not worth the cost, yet stumbling towards autonomy on vision only is? If the car becomes an money-earner, you should use all of the systems available. The 2019 Autonomy Day was an utter embarrassment. I'm sure 2021 will be more of the same.

So now it seems that they've realized their folly in logic. So what's the solution? Well, you can't just complain about COST of non-vision perception systems. Because, as noted above, that doesn't make sense if you're going to simultaneously claim that your car will be able to earn you money (augmenting any extra hardware cost that gets you to that point faster). No, now you have to smear all non-vision perception systems. You have to say that their data is worthless and detrimental to the overall effort.

The entire claim from the 2019 Autonomy Day that "vision is what humans use to drive" is also completely bogus. Humans use many senses to drive. They feel the pedals and steering wheel. They use their equilibrio sense to sense motion. And they use their hearing to hear other vehicles, sirens, and issues with their own car (driving with headphones in is illegal for a reason). Any modern car, even a Tesla, is also using far more than just vision when attempting autonomy. Forget about radar and LIDAR for a moment. There are endless sensors in the drivetrain. Steering angle sensors and multiple IMUs for the electronic stability control. Brake and wheel sensors for the ABS. Temperature sensors everywhere. And countless other ECUs. The notion that vision is getting you there exclusively is nonsense. There's no good argument against LIDAR today other than perpetuating a lie to sell cars that are cheaper to produce. And, Karpathy has a massive professional conflict of interest in making CV the main player -- he's a CV expert. He was never a fusion expert before his hire. If CV is the pathway forward, he's gets to remain "the guy". It certainly behooves HIM to make that claim.

Autonomous driving will not be achieved in this decade. Perhaps ever. Ask yourself honestly: if you were tasked with building an autonomous commercial aircraft OR an autonomous car, which would you choose? Most would say aircraft -- nothing to really hit in the air, fully mapped airport and runway systems, and far fewer variables. Yet autonomous aircraft still do not exist. Perhaps the edge cases always rule the roost. Ask yourself why driving would be any different...

[+] kortex|4 years ago|reply

Re: senses - have you ever played a driving sim? You can drive just fine with vision without tactile.

Sirens are primarily a means to get you to look in a direction. 360° cameras can notice the emergency vehicle as soon as it's visually relevant. And if they decide they need an audio siren detector, that's like, practically intern level signals detection at this point. Hardly a dealbreaker.

100% hands-down would pick autonomous car vs airplane. Flying a plane isn't just moving the aluminum bird through 3 space and periodically taking off or landing.

Autonomous aircraft don't exist because a huge amount of the ritual of flight is before and after the captain is even on the plane, let alone flying. There is a tremendous amount that the pilot and copilot go through, on the ground, before taxi, after liftoff. It's way, way more involved and way more generally intelligent. We can design AI to take off, path to a destination, and land. Those 3 things are the easiest parts of flying, yet do not comprise the act of flying a 2-seater, let alone an airliner.

[+] nickik|4 years ago|reply

> Musk and Karpathy have been dead wrong about LIDAR for years.

Right, because all those companies that use LIDAR are making billions driving people around. Oh, wait, actually they are burning 100 of millions every year.

> I think most hilarious is that both Karpathy and Musk claim the LIDAR systems are too expensive.

That's literally the opposite of what Musk says about Lidar. He LITERALLY said he wouldn't use them if they were free.

> yet stumbling towards autonomy on vision only is

Look up the Marginal Revolution from 1870.

> The entire ...

... The entire paragraph is an exercise in missing the point.

It seems really what you are saying is not that Tesla are charlatan but the whole industry is.

[+] locuscoeruleus|4 years ago|reply

> Yet autonomous aircraft still do not exist.

Well, that's a truth with modifications [1]

[1] https://www.youtube.com/watch?v=B2uc98EEPqE

[+] andyxor|4 years ago|reply

I like Andrej from his PhD research days and awesome blog posts but this is a series of disasters in the making, that is until FTC steps in after more people die from “self-driving” accidents under interesting and unexpected circumstances.

The whole vision vs. LIDAR stuff is a distraction as long as Tesla “AI” doesn’t have common sense.

It literally doesn’t know what it’s doing, and the tail of edge cases to "fit" the models is infinitely long. ANNs are fundamentally backwards looking and cannot adapt to unforeseen or even slightly unusual combination of circumstances. It will go fine for n miles and will dramatically fail at mile n+1 where a new situation requires understanding of ones surroundings, and n is arbitrary number.

It would be more honest to show the cases where it missed, thankfully there is no lack of them in “FSD beta“ videos on YouTube.

[+] ketamine__|4 years ago|reply

I've read claims that they are desperately trying to hire.

https://mobile.twitter.com/TaylorOgan/status/140705191831739...

[+] karpathy|4 years ago|reply

For the record these are some blatant & false FUD attempts.

[+] jowday|4 years ago|reply

Anecdata but both of the people I know that worked on Autopilot quit within 18 months of starting, citing extreme overwork and Musk micromanaging things. This lines up with that.

[+] DSingularity|4 years ago|reply

Maybe it’s an unfortunate side effect of their success. Almost all early members of the team are now multi millionaires if they held their stock.

[+] judge2020|4 years ago|reply

https://twitter.com/elonmusk/status/1407050578149183488?s=20

> Purpose is recruiting.

[+] notJim|4 years ago|reply

This appears to be the source given[1] for the claims https://www.snowbullcapital.com/tracking-tesla-job-postings. No idea what to make of this methodology, which appears to be based on tracking re-use of req ids. Someone must be scraping LinkedIn data, which would surely be more reliable.

[1]: https://mobile.twitter.com/TaylorOgan/status/140709356765125...