top | item 45844148

(no title)

Can anyone recommend a technical overview describing the design decisions PyTorch made that led it to win out?

discuss

The choice of the dynamic computation graph [1] of PyTorch made it easier to debug and implement, leading to higher adoption, even though running speed was initially slower (and therefore training cost higher).

Other decisions follow from this one.

Tensorflow started with static and had to move to dynamic at version 2.0, which broke everything. Fragmentation between tensorflow 1, tensorflow 2, keras, jax.

Pytorch's compilation of this computation graph erased the remaining edge of Tensorflow.

Is the battle over ? From a purely computational point, Pytorch solution is very far from optimal and billions of dollars of electricity and GPUs are burned every year, but major players are happy with circular deals to entrench their positions. So at the pace of current AI code development, probably one or two years before Pytorch is old history.

[1] https://www.geeksforgeeks.org/deep-learning/dynamic-vs-stati...

saagarjha|3 months ago

Someone’s got to prototype the next generation of architectures.

Uehreka|3 months ago

> at the pace of current AI code development, probably one or two years before Pytorch is old history.

Ehhh, I don’t know about that.

Sure, new AI techniques and new models are coming out pretty fast, but when I go to work with a new AI project, they’re often using a version of PyTorch or CUDA from when the project began a year or two ago. It’s been super annoying having to update projects to PyTorch 2.7.0 and CUDA 12.8 so I can run them on RTX 5000 series GPUs.

All this to say: If PyTorch was going to be replaced in a year or two, we’d know the name of its killer by now, and they’d be the talk of HN. Not to mention that at this point all of the PhDs flooding into AI startups wrote their grad work in PyTorch, it has a lot of network lock-in that an upstart would have to overcome by being way better at something PyTorch can never be good at. I don’t even know what that would be.

Bear in mind that it took a few years for Tensorflow to die out due to lock in, and we all knew about PyTorch that whole time.

huevosabio|3 months ago

I don't know the full list, but back when it came out, TF felt like a crude set of bindings to the underlying c++/CUDA workhorse. PyTorch felt, in contrast, pythonic. It was much closer in feeling to numpy.

puttycat|3 months ago

I think it was mostly the eager evaluation that made it possible to debug every step in the network forward/backward passes. Tensorflow didn't have that at the time which made debugging practically impossible.

albanD|3 months ago

I would highly recommend the podcast by ezyang https://pytorch-dev-podcast.simplecast.com/ for a collection of design discussions on the different parts of the library.

mxkopy|3 months ago

I’m not sure if such an overview exists, but when caffe2 was still a thing and JAX was a big contender dynamic vs static computational graphs seemed to be a major focus point for people ranking the frameworks.