top | item 21495685

Andrej Karpathy talks about how Tesla's NNs are structured and trained [video]

415 points| ojn | 6 years ago |youtube.com

241 comments

order

protomikron|6 years ago

Fun fact for all of you:

Some time ago (around ~10 years) this guy (the presenter) was internet famous for being a Rubik cube speed solver and making tutorials and videos about that: https://www.youtube.com/watch?v=609nhVzg-5Q

nsilvestri|6 years ago

I'll always know him as badmephisto. In a recentish reddit AMA, he says he still keeps a cube on his desk so he can practice a bit and not forget his algorithms.

eyeundersand|6 years ago

Wow! Thank you.

What a blast from the past! I learned how to solve the Rubik's Cube blindfolded by watching him, back in the day. His tutorials are perfect and I've probably recommended his channel to ~50 people myself. Crazy he only has 36.6k subs.

Glad to see he's doing well.

jacquesm|6 years ago

The competition in this space is great but I can't help but wonder what would happen if instead all these companies pooled their resources and went after the goal collectively. There is so much duplication going on and the paths do not seem to me - as an outsider - to be all that divergent, which is usually a pre-condition for having a lot of independent efforts one of which will succeed.

It's as if everybody wants to be the one to exclusively own the tech. Imagine every car manufacturer having a completely different take on what a car should be like from a safety perspective. We have standards bodies for a reason and given the fact that there are plenty of lives at stake here maybe for once the monetary angle should get a back-seat (pun intended) to safety and a joint effort is called for. That would also stop people dying because operators of unsafe software are trying to make up for their late entry by 'moving fast and breaking things' where in this case the things are pedestrians, cyclists and other traffic participants who have no share in the monetary gain.

jfoster|6 years ago

> The competition in this space is great but I can't help but wonder what would happen if instead all these companies pooled their resources and went after the goal collectively.

It would probably slow down. 9 women can't have a baby in 1 month. Besides that, the disagreements about approach, politics, or eventual competitive interests would probably bring things to a halt for a long time.

I don't think the solutions to this problem are resource-constrained. Many companies would happily find more resources in order to be first to market with this technology.

account73466|6 years ago

>> if instead all these companies pooled their resources and went after the goal collectively.

That would be a bad idea because like in evolutionary processes you need this diversity of ideas to locate better local optima even if it will take longer.

paraschopra|6 years ago

Standards shouldn't emerge too soon. I think for self driving tech, at the current stage, competition is good because there are lots of unsolved questions. Competition will ensure the best tech is ultimately available to consumers.

Of course, it's not a binary choice. Things like data should probably be pooled but the use of data in tech should compete.

konschubert|6 years ago

I think that there is still a need for some brilliant insights and breakthroughs, it isn't just a matter of getting the work done.

So actually, I think it's one of these situations where having a lot of independent efforts might be worth it.

londons_explore|6 years ago

> There is so much duplication going on

In the self-driving world, the duplication is necessary - different companies are taking different directions, and nobody really knows which will work out.

In the ML hardware world, the duplication is mostly unnecessary. People are developing their own inference hardware ASIC's because they're relatively simple (compared to designing a CPU from scratch, designing a TPU is pretty simple because there are so few operations, and no complex out-of-order execution), and you can't buy one off the shelf yet.

As soon as ML hardware becomes available to buy off the shelf without a massive price premium, everyone will switch to that.

ArtWomb|6 years ago

We could see some of this play out in the China EV market in the coming years. State sponsored subsidies around infrastructure standardization. Combined with foreign investment and competition spurring innovation.

What I've seen personally is what can be loosely termed "emergent consensus". Historical competitors (and often it gets whittled down to two giants, such as Boeing and Airbus) will work in secret on research. But after years of experimentation arrive at very similar outcomes. An optimal answer that could only be arrived at through constant trial and error, product evolution and iteration.

Regarding Karpathy's PyTorch presentation I don't thing anything that wasn't already public was revealed. The FSD board with custom NPUs is a Work of Art. I like that there are dual redundant streams. And the scale of the dataset is already well know: 4096 HD-images per step!

If I had to speculate, the "Dojo Cluster" may be envisioned as an effort to share data and compute with industry partners as a cloud SaaS product and ancillary revenue stream. But that is pure speculation ;)

Inside Tesla’s Neural Processor In The FSD Chip

https://fuse.wikichip.org/news/2707/inside-teslas-neural-pro...

mkolodny|6 years ago

Fortunately, some companies do share a significant amount of what their cars have learned so far. Uber publishes a ton of papers about their self-driving research [0][1]. Waymo released an open autonomous driving dataset, and publishes papers as well [2][3].

Of course, papers and data aren't code. But I think a lot more is being shared than people realize.

[0] https://eng.uber.com/author/raquel-urtasun/ [1] https://eng.uber.com/research/?_sft_category=research-self-d... [2] https://waymo.com/open [3] https://arxiv.org/pdf/1812.03079.pdf

d_burfoot|6 years ago

I don't know about sharing tech, but there should definitely be a shared evaluation benchmark, and some kind of oversight agency should be involved. The idea would be: if you want to be permitted to operate an AV on public roads, you need to demonstrate that your vehicle's vision system can detect pedestrians and obstacles with near-perfect accuracy on a large shared image database, most of which is NOT distributed to researchers.

credit_guy|6 years ago

Competition is good. It keeps you honest. The Manhattan project experienced competition, of the do-or-die type. It only was not internal, it was from the Nazis, Japan, and later the soviets.

Right now, the competition in the self-driving area is metaphorically as close to do-or-die as you can have in peace time. GM as a regular car manufacturer is toast, Cruze is pretty much their only hope. Uber is bleeding, if they pull self-driving off, they are kings. The German manufacturers are watching in disbelief as Tesla is starting to eat their pie. Conversely, Tesla knows that in what they are doing (electric cars) they don't enjoy any fundamental moat. If the Germans get their act together, they'll be able to make equally performant electric cars, but true luxury. The only one that's not really about survival is Waymo.

maelito|6 years ago

I wonder if the French civil nuclear electricity program that led to this level of low carbon emissions, or the TGV (high speed rail system), could be good examples of what you're asking for.

Maybe the companies actually building the nuclear plants and trains and rails were actually in competition ?

atoav|6 years ago

There was a time in the medival ages where alchemists were kidnapped by kings and held in chambers so they would only generate knowledge for them. This obviously lead to a similar duplication to the one you describe, right up to calculus where Newton kept the thing hidden in a drawer and then Leibnitz had the same idea.

Once that kind of secrecy was gone our whole technical progress was accelerated, because people could build on the discoveries of other people.

Right now we are going back to the alchemist model in some ways (the highest profile people work for the big companies and don’t share their discoveries). This makes progress slower.

kiwicopple|6 years ago

Ideally everyone would collaborate on inputs and compete on outputs. All the data gathering, tagging, mapping etc could be put into a shared domain, and then after that the companies decide what to do with it and how to commercialise it.

Easier said than done, but I think it would strike the right balance between reducing duplicate work, and incentivising progress.

arketyp|6 years ago

You can apply this line of reasoning on many markets, like the pharma or food industry which also have safety concerns. It strikes me as the kind initiatives EU attempts nowadays when realizing we are running behind on some tech and want to leverage the one possible advantage we have as a great centralizing power. Not too different from communist states, actually. I agree with the sentiment that redundancy seems wasteful, but it seems to me a necessary evil as a driving force in development, as with the right to private property in general.

eanzenberg|6 years ago

Competition is good

acollins1331|6 years ago

I think the diversity you see in cameras and lidar placement and existence is worth it enough to have different paths forward. Tesla seems insistent that it can be done sans lidar. It's definitely worth it to see which approach works best.

logicallee|6 years ago

>Imagine every car manufacturer having a completely different take on what a car should be like from a safety perspective. We have standards bodies for a reason

Roads are also governed by public bodies. Road signs are standardized and public.

I think the government should take a much larger role in defining self driving cars. For example, rather than using computer vision to recognize signs, signs could be active standardized beacons; instead of having to recognize lanes, they could be repainted with rfid chips that are trivial for cars to recognize and follow.

Avoiding driving into people is also something that was somewhat regulated by crosswalks with pedestrian lights. Would it be absurd for the crosswalk to know roughly how many people are at it, then broadcast this to the car, rather than having the car have to recognize them?

There are many things the government could do with transportation infrastructure that would benefit everyone, many of which are literally impossible for companies to do separately. Can you imagine if we had to wait until IBM (or Siemens or Google or Apple) got into the business of launching satellites before we got GPS? There is a good chance that to this day cell phones wouldn't know their location or give anyone any mapping applications.

To me, self driving cars are similar. Many parts of transportation are a public good.

timzaman|6 years ago

throwaway010718|6 years ago

Any guess what the compensation is like for these positions ?

soulslicer0|6 years ago

lol i just did the interview and failed. had to find shortest path between tesla chargers. all in C++. completed it but failed

sdan|6 years ago

Thanks Tim! Would love to apply, but still a student. Hoping to join in the future given how nicely orchestrated your team has been training nets.

modeless|6 years ago

Awesome presentation. Crazy that they're developing their own training hardware too. It's going to be a very crowded space very soon. Can they really stay ahead of everyone else in the industry? Can it really be cheaper to staff up whole teams to design chips for cutting edge nodes, fabricate them, build supporting hardware and datacenters and compilers, than to just rent some TPUs on Google Cloud?

I can see the case for doing their own edge hardware for the cars (barely), but I really don't think doing training hardware will pay off for them. If they're serious about it, they should spin it out as a separate business to spread the development cost over a larger customer base.

Also, I'm really curious whether the custom hardware in the cars is benefiting them at all yet. Every feature they've released so far works fine on the previous generation hardware with 1/10 the compute power. At some point won't they need to start training radically larger networks to take advantage of all that untapped compute power?

antpls|6 years ago

Watch the presentation from 6 months ago, where they explain the decision to build their own hardware for inferring : https://youtu.be/Ucp0TTmvqOE?t=4309

It's not surprising that they also build the hardware for training. Correct me if I'm wrong, but Google use the same TPUs for training and inference, because the underlying operations are the same : multiply then add numbers. Once Tesla built the hardware for inferring, the design of the hardware for training is probably similar.

Unlike Google's TPUs, Tesla have a specific use case for the hardware (computer vision for automotive), and maybe than means they can further optimize the computation pipeline with their own specialized hardware.

breatheoften|6 years ago

I think the size of the networks they are training might already be good motivation for developing custom hardware for training.

I would expect their training hardware to be something specifically aimed at optimizing memory bandwidth to support distributing training of their “shared” hydra feature. It’s interesting that the shared hydra feature extractor is able to converge as they keep adding more and more output predictions under a training regime of interleaving asynchronous updates to the model from different predictor networks ...

Seems to me the formula they are pursuing with custom hardware might be to support a strategy of 1. keep adding more predictions based on same feature 2. Increase the span of time represented by batches used to train the recurrent networks

Both pursuits seem very data efficient in terms of the amount of training data they could conceivably collect per unit time of observation ...

Custom hardware with a problem specific memory architecture aimed at efficiently supporting training with very large rnn time slices could be developed that’s more about “make it possible to train this proposed model at all” rather than “make it faster/cheaper to train existing common model architectures”. When custom hardware is required to make it possible to train the model they want, the validity of the hardware development cost bet might end up being more about the effectiveness of the model they think they want than it is about maintaining general purpose performance parity vs any off the shelf hardware options ...

jeffshek|6 years ago

At Tesla's scale and priorities, they'd probably be less keen on using external cloud providers. Using TPUs at their scale would certainly require Google's AI consultants to supervise which isn't ideal for Tesla.

Not agreeing or disagreeing with their decisions, but if you have the resources, you can certainly design a custom chip that performs a specific type of task very well that beats other competitors. Nvidia's GPUs are have to be reasonably good at training across different NNs. You could have a chip that's exceptional good at training one/two specific types of tasks.

For most companies, this would be a bad idea. However, Tesla knows how to produce hardware.

m0zg|6 years ago

Nothing crazy about it. TPU-like stuff is ~10x the energy efficiency of GPUs and several times the speed. When you're spending megawatt-hours and days to train a single model, it adds up in both real and opportunity costs.

Also, Google TPU TOS prohibits the use of TPUs for stuff that competes with Google (and I'm assuming with other companies under Alphabet umbrella), at Google's sole determination. Not that it would be a good idea to upload Tesla's proprietary data into Google Cloud even if it did not. Cloud, after all, is just somebody else's computer.

roystonvassey|6 years ago

Also, the software part of it (NNs and their algorithms) have been so widely researched and published that competitive advantages here are harder to come by than in hardware RD.

Also, vendor lock-in is a huge challenge in the cloud space. I don’t think Tesla would be comfortable with the fact that all their training data sits on a potential competitor’s datacenter.

dna_polymerase|6 years ago

So would you trust the company that owns one of your biggest competitors in this field (Waymo) with the stuff that decides over success: data?

sdan|6 years ago

Did they say they were building their own training hardware? I thought it was just their inference hardware (the boards on the teslas)?

joenathanone|6 years ago

>Also, I'm really curious whether the custom hardware in the cars is benefiting them at all yet. Every feature they've released so far works fine on the previous generation hardware with 1/10 the compute power.

The latest OTA finally brings a hardware v3 only feature, traffic cone visualization, and traffic cone automatic lane change.

thebruce87m|6 years ago

Stay ahead? Are they actually ahead?

sdan|6 years ago

Really liked this talk.

Looks like they are really nicely orchestrating workloads and training on numerous nets asynchronously.

As a person in the AV industry I think Tesla's ability to control the entire stack is great for Tesla... maybe not for everyone who can't afford/doesn't have a Tesla.

natch|6 years ago

>maybe not for everyone who can't afford/doesn't have a Tesla.

Affordability is not as much of an issue as some make it out to be. Cost-wise it's like owning a Camry or an Accord, if you go for the lower end models. If you mean not everyone can afford a new car, then sure I agree with you.

Edit: if you think I'm wrong about this, please explain or ask me to clarify anything?

londons_explore|6 years ago

I'm still amazed that Teslas team isn't using a map... I know maps get outdated and are sometimes wrong, but having inaccurate knowledge of what's around the corner is far far more helpful than not having any clue whats around the corner.

The smart solution would be to consider a map a probabilistic thing, which neural networks are really good at handling.

anonu|6 years ago

I'm still amazed Tesla has decided not to use lidar and instead just stick with cheap cameras. Better sensors are there, they're available, they're cheap and they can probably "see" better than plain old cameras... it doesn't make too much sense not to use them IMHO. But then again, I am not coding NNs for Tesla...

mattrp|6 years ago

I could be wrong but I recall Lyft is using hyper accurate maps.

Gravityloss|6 years ago

Interesting that they don't have a full 3D world model. I'm certainly not a machine learning expert. I'm still amazed the route from image recognition to a 2D map of "what's drivable" to autonomous driving is so direct. One would expect to hit a ceiling really soon with that approach.

To me it seems we're still in really early days.

spyder|6 years ago

They're doing 3D for the road path, and even predicting it beyond corners:

https://youtu.be/Ucp0TTmvqOE?t=8137

And later in the video they show 3D reconstruction from cameras and saying they use it in the car.

Watching the full talk is recommended if you have the time (talk starts around 1:10:00 in the video)

eanzenberg|6 years ago

One thing I didn't quite understand is how training sub-graphs in parallel works. If you are editing a sub-graph of a monolith type model, aren't you affecting other graphs that have dependencies on the one you're editing? If these are independent graphs, then what's a "sub-graph" even mean?

punnerud|6 years ago

In PyTorch you have full control on the graph and weight, everything feels like Python. So feeding some of the learning between “sub-graph” is easy. Not sure if this is possible on Tensorflow/Keras?

He describes the sub-graph training in the context that they they have all the predictors in one big model, and with control of the network can feedforward and train sub-graph (read sub-parts) of the model.

paraschopra|6 years ago

I think their architecture might be their secret sauce. But I'm curious about this too.

fyp|6 years ago

For those who want to learn more, I would start with Mask-RCNN where you have a very similar architecture: one shared backbone with multiple heads that can be retrained for various tasks (bounding boxes, masks, keypoints, etc): https://youtu.be/g7z4mkfRjI4?t=628

kegan|6 years ago

Anyone knows why Andrej's team chooses PyTorch (as oppose to say TensorFlow?)

jeffshek|6 years ago

Some potential reasons:

- TensorFlow is great at deployment, but not the easiest to code. PyTorch isn't frequently used in production until recently.

- If you have the resources for great AI engineers and researchers, your team will be good enough to build and deploy both frameworks.

- Preference toward the easier framework your tech leads prefer.

- Lots of new academic research is coming in PyTorch

- TensorFlow is undergoing a massive change from 1.1x to 2.0; if you choose TensorFlow, write on 1.1x just to then refactor to TF 2.0? Or write on TF 2.0 now and deal with all new edge cases? Or write in PyTorch (easier) but handle the more difficult deployment process.

- ML code quickly rots. Bad PyTorch code is just bad Python code. Bad TensorFlow code can be a nightmare to debug.

- PyTorch's eager execution makes coding NNs much easier to prototype and build.

tigershark|6 years ago

Not at expert, but as far as I understood PyTorch is much better to build new models, while with tensorflow it’s easier to assemble the predefined blocks. Source: somewhere in the motivations on why Fast.Ai courses switched to PyTorch for the second edition.

m0zg|6 years ago

Because PyTorch literally triples researcher productivity. Imagine a deep learning framework which you can actually debug when something goes wrong and which you don't have to fight every step of the way to do even simple things. That's PyTorch.

laichzeit0|6 years ago

The good news for me is that the upper bound for fully autonomous self-driving cars is no more than 50 years away. What a time to be alive. If it happens before then, that will be an absolute bonus.

diveanon|6 years ago

Andrej Karpathy is such a treasure.

He is an excellent presenter who really has a passion for teaching.

Im not really involved with the industry, so I cant really speak to how he holds up to other experts. However he is by far the most digestable resource I have found for learning about NN and science behind them.

If you are just discovering him now, google his name and just start reading. His work is truly binge worthy in the most meaningful way.

SloopJon|6 years ago

The description of SmartSummon about halfway through the talk is interesting. One of the views looks like SLAM using a particle filter, but Andrej seems to say that it's done entirely within a neural net.

alexnewman|6 years ago

Jeeze and I can't get my pytorch to stop leaking memory. I couldn't imagine trying to drive a car with it

Joky|6 years ago

Pytorch is used to train models on servers/cloud, not to drive the car later. The trained model is converted to something native to the embedded environment of the car.

jfoster|6 years ago

I wonder if the environment the car discovers includes elevation. Would be necessary for handling many carparks.

adamnemecek|6 years ago

The trick for level 5 is learning the mapping between the lidar point cloud and the video stream. It’s the best of both worlds.

pgodzin|6 years ago

Tesla doesn't have have a lidar point cloud at all

ojn|6 years ago

That falls apart as soon as the map and the real world deviates and you need to drive based on what’s in front of you.

Lidar helps you spot obstructions, but won’t tell you what they are and won’t help you figure out what to do to avoid them.

Want an example? Cruise’s first real world demo got stuck behind a simple taco truck in downtown SF.

Geee|6 years ago

2D to 3D transform is simple trigonometry (using stereo / motion) and should be possible to learn without lidar. I think this is already a solved problem. One option though is to add lidars in random Teslas (e.g. 1/1000) to help with the labeling / learning.

sheeshkebab|6 years ago

One could also train a car driving model driving in grand theft auto... but are all these tricks really what level 5 is about? I doubt

mkagenius|6 years ago

Oh he's no longer with OpenAI? Sam Altman must be worried about this..

spectramax|6 years ago

Without meaning offense to Sam, I thought he was an investor / YC head. What credentials does he have to be at OpenAI?

new_realist|6 years ago

Elon Musk poached him, and for that was kicked off the OpenAI board.

cyrux004|6 years ago

for over 2 years now

new_realist|6 years ago

Meanwhile Waymo is way ahead.

grecy|6 years ago

Do you belittle everyone that gets second place in the Olympics because the winner is "way ahead"?

Your comment just reeks of anger and hostility.

It seems like you'd rather Tesla didn't try at all, and instead we all just give up and go back to the status quo.

tim333|6 years ago

Waymo has a significantly different approach using lidar rather than just vision. The approaches seem to have different strengths and weaknesses. Waymo is actually able to do full autonomy but in very restricted environments - basically semi deserted suburbs. Tesla's autopilot works in real city rush hour traffic but not reliably enough to be let lose on it's own. It remains to be seen which will win or if it will be some other solution.

m0zg|6 years ago

Except, well, Waymo doesn't actually build cars, and has no plans to do so.

mindfulplay|6 years ago

Just listening to this talk scares me. The amount of errors - even in a seemingly normal, sunny day - is mind boggling to think people trust this crap.

How can we rely on the output of eight cameras? This is not a kid's science project.

It's all fancy neural networks until someone dies. Pretty callous and Silicon valley-mindset for such an important and critical function of the car.

Will never buy a Tesla after having seen this.

panarky|6 years ago

> mind boggling to think people trust this crap

It's also mind boggling to think we currently trust organic tissue to do this crap, some of which is bathed in psychoactive chemicals.

And yet we do, and as a result, horrendous catastrophes occur every minute of every day.

> It's all fancy neural networks until someone dies

No, that can't be the standard, not when people are dying right now in the current regime.

Unless the new regime kills and maims at a higher rate than the current regime, there is no reason for fear.

freediver|6 years ago

You are right this approach is scary, and it is astonishingly innacurate (im a tesla owner).

However the reason are not eight cameras. You should be able to drive fine with just one camera (thought experiment: could you drive a car 1000 miles from you, just by seeing what the driver of that car would see, no extra cameras, sensors or lidars?).

optimiz3|6 years ago

Your comments read like /r/RealTesla | TSLAQ fud.

Please explain, in detail, what your specific objections are, and how you are more qualified on this subject matter than the presenter.

natch|6 years ago

It does require human supervision while driving.

Tesla makes this very clear.

It's also all over the Internet.

But there will always be people who do not get this.

Those people should not drive Teslas, or pretty much any modern car for that matter.

If you are under the impression that you would be relying on this system to drive the car, then I agree you should not get a Tesla.

Of course full self driving is coming at some point, but that's a conversation for another day. Meantime Tesla is making steps toward it very incrementally with things like the "stop mode" rolling out right now.

yo-scot-99|6 years ago

this tech is interesting but so poorly understood that it's just using the (public) roads as one large alpha test. given a NN there is no way to verify what safety ranges are there. for instance if each camera slightly changed exposure or occlusion are the results smoothly changing? all they can do is try it and hope the inputs are in a safe part of their optimization space.

0-_-0|6 years ago

Just replace "eight" with "two" and this could have been written about the human brain

matz1|6 years ago

How many actually dies? Do you have the statistic?

k2xl|6 years ago

Hmm how do you explain that data shows that there are likely less accidents and less road fatalities when autopilot is engaged?

newnewpdro|6 years ago

Same.

Watching the AI visualization of summons in-action was horrifying, and made clear why many have reported summons mode as resembling a drunk person navigating a parking lot.