(no title)
qmatch
|
3 months ago
As a loyal JAX user, I hope they can play catchup. PyTorch has dominated the AI scene since TF1 fumbled the ball at 10th yard line. What Matt Johnson has done turning Autograd into JAX is hopefully going to be worthy of as much praise as what Soumith has received.
n_u|3 months ago
can you explain why you think TensorFlow fumbled?
probably_wrong|3 months ago
In my University we had to decide between both libraries so, as a test, we decided to write a language model from scratch. The first minor problem with TF was that (if memory serves me right) you were supposed to declare your network "backwards" - instead of saying "A -> B -> C" you had to declare "C(B(A))". The major problem, however, was that there was no way to add debug messages - either your network worked or it didn't. To make matters worse, the "official" TF tutorial on how to write a Seq2Seq model didn't compile because the library had changed but the bug reports for that were met for years with "we are changing the API so we'll fix the example once we're done".
PyTorch, by comparison, had the advantage of a Python-based interface - you simply defined classes like you always did (including debug statements!), connected them as variables, and that was that. So when I and my beginner colleagues had to decide which library to pick, "the one that's not a nightmare to debug" sounded much better than "the one that's more efficient if you have several billions training datapoints and a cluster". Me and my colleagues then went on to become professionals, and we all brought PyTorch with us.
stared|3 months ago
TensorFlow (while a huge step on top of Theano) had issues with a strange API, mixing needlessly complex parts (even for the simplest layers) with magic-box-like optimization.
There was Keras, which I liked and used before it was cool (when it still supported the Theano backend), and it was the right decision for TF to incorporate it as the default API. But it was 1–2 years too late.
At the same time, I initially looked at PyTorch as some intern’s summer project porting from Lua to Python. I expected an imitation of the original Torch. Yet the more it developed, the better it was, with (at least to my mind) the perfect level of abstraction. On the one hand, you can easily add two tensors, as if it were NumPy (and print its values in Python, which was impossible with TF at that time). On the other hand, you can wrap anything (from just a simple operation to a huge network) in an nn.Module. So it offered this natural hierarchical approach to deep learning. It offered building blocks that can be easily created, composed, debugged, and reused. It offered a natural way of picking the abstraction level you want to work with, so it worked well for industry and experimentation with novel architectures.
So, while in 2016–2017 I was using Keras as the go-to for deep learning (https://p.migdal.pl/blog/2017/04/teaching-deep-learning/), in 2018 I saw the light of PyTorch and didn’t feel a need to look back. In 2019, even for the intro, I used PyTorch (https://github.com/stared/thinking-in-tensors-writing-in-pyt...).
HarHarVeryFunny|3 months ago
This new PyTorch approach was eventually supported by TensorFlow as well ("immediate mode"), but the PyTorch approach was such a huge improvement that there had been an immediate shift by many developers from TF to PyTorch, and TF never seemed able to regain the momentum.
TF also suffered from having a confusing array of alternate user libraries built on top of the core framework, none of which had great documentation, while PyTorch had a more focused approach and fantastic online support from the developer team.
Gazoche|3 months ago
Maybe TF has gotten better since but at the time it really felt like an internal tool that Google decided to just throw into the wild. By contrast PyTorch offered a more reasonable level of abstraction along with excellent API documentation and tutorials, so it's no wonder that machine learning engineers (who are generally more interested in the science of the model than the technical implementation) ended up favoring it.
[1] The worst part was that Google only hosted the docs for the latest version of TF, so if you were stuck on an older version (because, oh I don't know, you wanted a stable environment to serve models in production), well tough luck. That certainly didn't gain TF any favors.
zapnuk|3 months ago
The few people I know back then used keras instead. I switched to PyTorch for my next project which was more "batteries included".
michaelt|3 months ago
If their folder of 10,000 labelled images contains one image that's a different size to the others, the training job will fail with an error about unexpected dimensions while concatenating.
But it won't be able to say the file's name, or that the problem is an input image of the wrong size. It'll just say it can't concatenate tensors of different sizes.
An experienced user will recognise the error immediately, and will have run a data cleansing script beforehand anyway. But it's not experienced users who bounce from frameworks, it's newbies.
morshu9001|3 months ago
qmatch|3 months ago
I believe some years after the TF1 release, they realized the learning curve was too steep, they were losing users to PyTorch. I think also the Cloud team was attempting to sell customers on their amazing DL tech, which was falling flat. So they tried to keep the TF brand while totally changing the product under the hood by introducing imperative programming and gradient tapes. They killed TF1, upsetting those users, while not having a fully functioning TF2, all the while having plenty of documentation pointing to TF1 references that didn’t work. Any new grad student made the simple choice of using a tool that was user-friendly and worked, which was PyTorch. And most old TF1 users hopped on the band wagon.
rockinghigh|3 months ago
tdullien|3 months ago
htrp|3 months ago
bjourne|3 months ago
intermerda|3 months ago
cl3misch|3 months ago
I also like that jax.jit forces you to write "functional" functions free of side effects or inplace array updates. It might feel weird at first (and not every algorithm is suited for this style) but ultimately it leads to clearer and faster code.
I am surprised that JIT in PyTorch gets so little attention. Maybe it's less impactful for PyTorch's usual usecase of large networks, as opposed to general scientific computing?
havercosine|3 months ago
JAX seems well engineered. One would argue so was TensorFlow. But ideas behind JAX were built outside Google (autograd) so it has struck right balance with being close to idiomatic Python / Numpy.
PyTorch is where the tailwinds are, though. It is a wildly successful project which has acquired ton of code over the years. So it is little harder to figure out how something works (say torch-compile) from first principles.