Maybe “TF 3” or whatever they call it will be ergonomic and a pleasure to use, but that was the promise of TF 2, and unless you wanted to use Keras it was anything but. I’m glad I have been able to work only in PyTorch and Jax. Maybe TF will get better but I’m not holding my breath. On the other hand, XLA is very nice, and I hope they continue to develop it.
I hope torch has a convincing distributed tensor API coming soon. Their development on ShardedTensor seems to have slowed or stopped recently, so TF’s DTensor is definitely ahead, which is a shame. And of course TF’s ecosystem for the whole lifecycle is more mature with TFX, TF.js, etc, but torch is slowly closing those gaps, and hopefully that will continue.
The ergonomics for Tensorflow are probably always going to be behind PyTorch (or Keras for that matter). The fact that the API has not been stable for the past 6 years has really burned me one too many times to not flinch at using it. It is basically an internal Google tool that has been made available to the public, and like most internal tools at Google the deprecated/developmental dichotomy applies (https://goomics.net/50/).
That said the deployment of TensorFlow models onto mobile devices or the browser is really good, so sometimes the pain is necessary.
PyTorch does have issues with both distributed tensor (which is easier to solve) and deployment (which is harder to solve, but solvable).
I also think Keras' Functional API is superior in terms of composability than PyTorch's OOP model, but I am biased as a software engineer. It does feel like the community thinks the OOP model is much more hackable thus easier to use.
All in all, it is still early days. We don't have a competent all-in-one OSS SQL database until late 2000s, which is 20-ish years after the theory was ready and taught extensively in the school. And even after that, we have plenty of innovations around database in 2010s for new use cases. Frameworks for differentiable programming have long way to go.
Merging Keras into TF and trying to copy PyTorch by adopting their model was the downfall of TF. The initial TF releases were great. It was simple, easy to reason about, and solved a clear problem. But then Google wanted TF to appeal to everyone and solve all problems for industry and research and beginners and experts at the same time because people need to get promoted internally. And that never goes well. Nobody I know wants to use TF these days (but some are forced to).
I object to this statement. The earlier releases of TF (ca. 2013) were impossible to debug and the documentation was always broken - if you tried to follow the Seq2Seq tutorial you know what I'm talking about. I'd argue that these releases were great for the 50 people who already knew how to use it, but they were aggressively unhelpful for beginners.
PyTorch won my lab (and certainly others) because you could add prints to check your dimensions while TF forced you to build a correct computation graph all at once. Performance? Sure, TF is probably faster. But I'd argue that TF's big mistake was not taking their new users into consideration.
I agree. I truly miss the graph-mode API, especially coming from Theano before TF, but it wasn’t as beginner friendly and Google wanted to capture market share for their cloud.
At least with jax the core library isn’t adopting any of the framework level stuff so those can evolve independently.
the original tf was truly horrendous :) it was extremely unintuitive and slow to program. that's why so many people switched to pytorch. the merge was a good idea, it just came too late. tf is as popular as it is just because it was the first to market, & because of Google's megaphone, even inside Google researchers don't want to use it, and a large fraction have switched to Jax
The downfall was fragmentation. They merged Keras after TF started having multiple competing application libraries, each managed by different people (and each fighting for promotion, as you say). Keras just happened to be the most popular. Even after Keras, various teams (e.g., deepmind) have decided to make their own libraries.
In hugging face, almost 85% of models are exclusive to PyTorch, and even those that are not exclusive have about a 50% chance of being available in PyTorch as well. In contrast, only about 16% of all models are available for TensorFlow, with only about 8% being TensorFlow-exclusive.
That matches my experience reading a lot of ML research papers. Most papers that include code are PyTorch-only. Some have both PyTorch and TF. Few are TF-only. Then there is JAX and it's been growing.
I feel like the comments on backwards compatibility are due to the absolute shitshow of TF2 compatibility for TF1 code and models.
Also the threat of Pytorch can be seen when reading between the lines, especially since it's now run by a foundation and the darling of the diffusion model developments.
PyTorch has been a darling of almost every noteworthy open source model for the past 3-4 years (BLOOM, GPT-J, StyleGAN3, detectron, etc). Personally, I've only seen people use TensorFlow/XLA if they got free TPU credits from Google (gpt-neo), or if it was released by Google (t5).
I just wish they would put some focus on making tensorflow reliable in a production environment.
It can easily be wildly non deterministic across different cpus or GPUs, or even in the same session, with the same input.
Performance seems to get worse with new releases, and there are frequent subtle breaking changes when using models built with old versions on newer releases.
Tensorflow serving is barely controllable, and requires insane tuning to make it perform the same as pytorch, but provides little to no docs.
The vast majority of models people build just don't work in tensorflow serving either, as you can't reach in with hacky python to mess with internal state.
If you use a custom host instead then you have to deal with literal gigabytes of python dependencies, making your docker images huge.
Memory usage is uncontrollable and causes terrible performance or instant host death. Results vary depending on cpu count, and automatic parallelism can reduce performance.
I just don't understand how Google use tensorflow internally for real world services.
I've had my last straw moment with tensorflow some time ago. JAX has been an a pleasure and Im never looking back. It got to a point on my team where it was just easier to rewrite our entire distributed training infrastructure in jax with pmap than coerce TF2 into doing what I wanted it to.
Fix the damn abstraction bugs! Models of models only works in a limited fashion. Its a damn graph, I can't understand why they can't get this to work. For me the tf ship has sailed.
I believe it is, whether they realize it or not. JAX/PyTorch + XLA is just so much better in so many ways. Development on XLA will (thankfully) continue, and JAX and PyTorch will continue to cannibalize the TF userbase.
yes, if you see a blog post from google that follows this title pattern and the text says what this does, it means there's an explicit acknowledgement that something serious wasn't working and the leadership decided to course-correct.
TF will continue to have a place at Google for prod work but its application base is going to continue to shrink. I'm just blown away they're rejiggering the distributed model again.
It's just that TensorFlow is big and bloated, has quite a few quirks and it's not the cool thing to use anymore so they're doing some work to address these issues. TensorFlow is still very popular and you can get work done with it.
hey - it's using bazel, that makes things great. Well, bazel won't work on your system, because protobuf needs a patch for a new libc. Why didn't you use Ubuntu? Now it builds! Oooh, wait they are vendoring protobuf in tensorflow.....
(all that, while depending on a single-file, 3-release python library...)
[+] [-] atty|3 years ago|reply
I hope torch has a convincing distributed tensor API coming soon. Their development on ShardedTensor seems to have slowed or stopped recently, so TF’s DTensor is definitely ahead, which is a shame. And of course TF’s ecosystem for the whole lifecycle is more mature with TFX, TF.js, etc, but torch is slowly closing those gaps, and hopefully that will continue.
[+] [-] blululu|3 years ago|reply
[+] [-] liuliu|3 years ago|reply
I also think Keras' Functional API is superior in terms of composability than PyTorch's OOP model, but I am biased as a software engineer. It does feel like the community thinks the OOP model is much more hackable thus easier to use.
All in all, it is still early days. We don't have a competent all-in-one OSS SQL database until late 2000s, which is 20-ish years after the theory was ready and taught extensively in the school. And even after that, we have plenty of innovations around database in 2010s for new use cases. Frameworks for differentiable programming have long way to go.
[+] [-] p1esk|3 years ago|reply
What makes you say that?
[+] [-] brrrrrm|3 years ago|reply
[+] [-] mudrockbestgirl|3 years ago|reply
Let's hope JAX won't suffer the same fate.
[+] [-] probably_wrong|3 years ago|reply
I object to this statement. The earlier releases of TF (ca. 2013) were impossible to debug and the documentation was always broken - if you tried to follow the Seq2Seq tutorial you know what I'm talking about. I'd argue that these releases were great for the 50 people who already knew how to use it, but they were aggressively unhelpful for beginners.
PyTorch won my lab (and certainly others) because you could add prints to check your dimensions while TF forced you to build a correct computation graph all at once. Performance? Sure, TF is probably faster. But I'd argue that TF's big mistake was not taking their new users into consideration.
[+] [-] soraki_soladead|3 years ago|reply
At least with jax the core library isn’t adopting any of the framework level stuff so those can evolve independently.
[+] [-] make3|3 years ago|reply
[+] [-] mirker|3 years ago|reply
[+] [-] claudppl|3 years ago|reply
(source: https://www.assemblyai.com/blog/pytorch-vs-tensorflow-in-202...)
[+] [-] zone411|3 years ago|reply
[+] [-] reckless|3 years ago|reply
Also the threat of Pytorch can be seen when reading between the lines, especially since it's now run by a foundation and the darling of the diffusion model developments.
[+] [-] learndeeply|3 years ago|reply
[+] [-] Dayshine|3 years ago|reply
It can easily be wildly non deterministic across different cpus or GPUs, or even in the same session, with the same input.
Performance seems to get worse with new releases, and there are frequent subtle breaking changes when using models built with old versions on newer releases.
Tensorflow serving is barely controllable, and requires insane tuning to make it perform the same as pytorch, but provides little to no docs.
The vast majority of models people build just don't work in tensorflow serving either, as you can't reach in with hacky python to mess with internal state.
If you use a custom host instead then you have to deal with literal gigabytes of python dependencies, making your docker images huge.
Memory usage is uncontrollable and causes terrible performance or instant host death. Results vary depending on cpu count, and automatic parallelism can reduce performance.
I just don't understand how Google use tensorflow internally for real world services.
[+] [-] learndeeply|3 years ago|reply
What are you using for serving PyTorch models?
[+] [-] numlock86|3 years ago|reply
[+] [-] jdeaton|3 years ago|reply
[+] [-] jstx1|3 years ago|reply
[+] [-] ipunchghosts|3 years ago|reply
[+] [-] cavisne|3 years ago|reply
Not because it’s google but because Jax has so much momentum lately.
[+] [-] hyperbovine|3 years ago|reply
[+] [-] Jabbles|3 years ago|reply
https://www.deepmind.com/blog/using-jax-to-accelerate-our-re...
[+] [-] dman|3 years ago|reply
[+] [-] dekhn|3 years ago|reply
TF will continue to have a place at Google for prod work but its application base is going to continue to shrink. I'm just blown away they're rejiggering the distributed model again.
[+] [-] jstx1|3 years ago|reply
[+] [-] jstx1|3 years ago|reply
[+] [-] soraki_soladead|3 years ago|reply
[+] [-] rjsw|3 years ago|reply
[+] [-] fock|3 years ago|reply
(all that, while depending on a single-file, 3-release python library...)
[+] [-] trash3|3 years ago|reply
[deleted]
[+] [-] ipunchghosts|3 years ago|reply
[+] [-] jxy|3 years ago|reply
I love to hear that. But does Python even guarantee 100% backwards-compatible?
[+] [-] DSingularity|3 years ago|reply
For domain-specific languages like TF I guess they were motivated to commit to it to ensure adoption of the new versions.