I work at a small company as an engineer and recently was asked to do a project that would require some neural net magic. I had some experience with keras/tensorflow so that was my first choice.
Despite the absolute nightmare of getting it installed and running on a gpu, I managed it and had a fantastic model. It was doing so well that the company wanted to expand the project and build out a multi-gpu rig as part of it. So I get building that environment and install the latest CUDA, cuDNN, nvidia driver and use tensorflow 2.0 aaaaaand it wouldn't work. I actually spent a long time hacking on it till on a forum I read that it was just a bug that hadn't been fixed yet.
At this point I decided to see what Pytorch was like. In literally one day I installed everything and migrated my project completely over to pytorch. Same speed, same accuracy, works perfectly on a multi-gpu rig when I set it to. It was like a breath of fresh air.
The next day I wrote some C++ to import a saved pytorch model so it could run in a deployment environment. The C++ api is also great. The docs are lacking a little bit, but an Facebook researcher mentioned to me on the forums that they're hoping to have it all done by next month.
It's unlikely that I'll be going back to tensorflow.
When I used TensorFlow (briefly), it seemed there were tons of hidden assumptions my stuff had to follow or it wouldn’t work. PyTorch has a few I’ve run into, but mostly seems to “just work”. That’s why I think it’s much better for building anything novel (at least for the first time).
Complexity of installing TensorFlow, even with the inclusion of custom compilation and hacking Bazel (to make it work under CUDA version that it doesn't officially support) is low, compared to releasing a model that works in production.
Because of that, it doesn't make much sense to judge a "differential programming language" like TensorFlow or PyTorch by the ease of installation. It'd be like saying "I prefer C# over C++" because it is easier to install.
We use Pytorch extensively in our startup. We tackle a lot of new research problems as consultants/partners to help develop products or devise new algorithms/models to solve tasks for our customers. We have never regretted our choice to pick Pytorch. I found the article pretty spot on when comparing Tensorflow and Pytorch. The things that have appealed to me about Pytorch are:
1. Extremely easy to debug and work with. Being able to debug effortlessly in PyCharm makes life very easy.
2. The API is quite clean and nice and fits in really well with Python and nothing feels hacky. I've developed my own Keras-like framework for experimentation, training and evaluating models quickly and easily and the entire experience has been really enjoyable.
3. The nicest thing though is that as the article points out, a huge percentage of researchers have moved to Pytorch and this allows us to more easily look at other researcher's code and experiment with things easily and incorporate ideas and cutting-edge research into our own work. Even for things that are released in TensorFlow, if it is an important publication that gains attention and traction in the community, you will likely have implementations in Pytorch pop up soon enough.
I do think that TensorFlow still has an edge on the deployment at scale/mobile side of things as pointed out by the article. But Pytorch is a lot younger and they are making a lot of progress with every release in that space.
Last year I was tasked with looking into a NAS (Neural Architectures Search) paper and analyzing the algorithm. The paper came with a TensorFLow implementation. Trying to read that TF code was quite difficult. I searched around and found a PyTorch implementation - much easier to read and understand, and it ran about 50% faster as well (the latter was a bit surprising). I tend to think that TensorFlow lends itself to the creation of code that's difficult to reason about. That may be different now with the various flavors of TF (like TF Eager).
I'll add that it was much easier to install PyTorch with GPU support than it was to install TensorFlow with GPU support - at least that's how it was around November of last year. The PyTorch install was painless, whereas we ended up having to build TF from source to work with out setup. Could be different now as I haven't looked at TF since then.
Granted that PyTorch and TensorFlow both heavily use the same CUDA/cuDNN components under the hood (with TF also having a billion other non-deep learning-centric components included), I think one of the primary reasons that PyTorch is getting such heavy adoption is that it is a Python library first and foremost.
There're maybe all of two "surprises" I've encountered in all my time using it, if even (1. Gradients are accumulated in state, 2. nn.Module does funky things with attributes, so use something like nn.ModuleDict if you're going to be dynamically setting modules). Everything else works like a dream, and works almost exactly how you expected.
Model parameters? .parameters() gives you a dict-friendly generator of tensors.
Model state? .state_dict() is a dictionary.
Loading model state? load_state_dict(state_dict)... just loads a dictionary.
Reusing modules across different modules? Just assign them!
Determining what parameters to optimize? Just ... give the list of parameters to the optimizer.
You can use all your using Python development and debugging tools, and it feels 100% natural. I can fit it into other Python workflows without making the whole program centered around TensorFlow.
TensorFlow is undoubtedly powerful, and if you have the time/resources to put into a static-ish TensorFlow-centric workflow, it could pay off many times over. But it definitely feels like learning an entirely new language, with an entirely different debugging pattern. And furthermore, a language that is constantly changing patterns and best practices, other than super-standard Keras examples.
To put into context, even running the official TensorFlow models repository has deprecation warnings. Whereas torchvision works like seamlessly and reads like a reference for writing PyTorch model code.
There is just a developer-centric focus to PyTorch that makes it a joy to use.
Yup, you make some great points and I couldn't agree more. Very recently, I was looking into training an object-detector for a custom problem with not many training examples. One of the classes (hardest one to train from few examples) was "person".
I was able to create a custom detection network for a 3-class problem, load up the COCO pretrained weights for the network, strip out all the other weights at the "head" for all the other COCO classes except for the "person" class and then fine-tune the model on my custom 3-class dataset. The resulting model generalized exceptionally well on people as it was still able to retain a lot of its performance from the COCO pre-training. It was so easy to do all of this. Literally, maybe 10 lines of code, and so easy to figure out since I could introspect the state_dict and the weights file directly in my PyCharm interpreter while working out how to do this.
We are considering to move to PyTorch, we really dislike how the Tensorflow 1.x -> 2.0 transition is handled. For years a lot of stuff has been added to tf.contrib, some things were only in tf.contrib and now that it's dropped in TF a lot of project (including ours) have to do quite large rewrites. Since the last few 1.x iterations, Tensorflow has been complaining that the older RNN layers are deprecated and that we have to move to Keras RNN layers, which they claim to be equivalent. However, when we tried a couple of months back, it made RNN-based training 45% slower. It is all fixable, but it takes time and a lot of testing of all the model variants to see if there are no regressions. It feels quite a bit worse than Python 2 -> 3.
I am a bit saddened by all of this, because I really liked how easy it is to define a graph in Tensorflow in Python, serialize it, and then use its minimalistic C API to use the graph in Go, Rust, or wherever you need it.
How is your experience with PyTorch and backwards API compatibility (I know that they only reached 1.0 fairly recently)?
There has basically only ever been one major API shift, which was the shift away from Variables. And, granted, that happened from 0.3->0.4, about a year after the initial release.
Other than that, I've had next to no issues, and the API has only gotten better over time, with more convenient ways to do things.
PyTorch has a much smaller footprint, and is happy to delegate code to separate libraries (e.g. torchvision), so you run into "all-or-nothing" dilemmas less frequently.
It's been pretty good with PyTorch. The API has been fairly stable and I've adopted code developed from 0.4.0 to 1.0.0+ with barely the need for any tweaks. Granted, it's a younger project so for now things are stable but maybe 3 years from now they may have some giant API refresh. But I find their API quite nice for the most part so I don't see them needing to switch everything up periodically.
Anecdotally, I've dumped TensorFlow in favor of PyTorch for almost all new work I'm doing at my organization (industry focused). Biggest gripes with TensorFlow are overly complex APIs, instability from release to release, constantly broken code in Google's repos, and poor documentation. Maybe TF 2.0 will be better, but for me, the PyTorch ship has already sailed, and I am sailing on it.
I have worked as data scientist on a lot of finance domain problems - forecasting default, fraud, conversion probability ect.
Lightgbm library has consistently performed well. I've been interested in how many colleagues instantly jump to neural nets when in my experience this often doesn't beat lightgbm on medium sized datasets not related to text/images.
I think this is pretty common. For tabular data, lightgbm/xgboost/catboost usually give better results and require a lot less work (less pre-processing, for example) than neural nets.
One area where I wonder if neural nets would be a more useful option is using something like an LSTM to predict defaults based on a sequence of data? I've tried this a handful of times and doing a bit of feature engineering to aggregate data in a handful of fixed buckets has usually been better and easier, but I'm far from an expert in that area.
I know Jeremy Howard has shown decent results with fastai/pytorch for tabular data and I've seen some Kaggle teams do well with neural nets for tabular data. I've also had decent results with gbdt/nn ensembles. But I think in most situations where you just have tabular data, you'll get better results with less effort if you use lightgbm or the like.
More anecdata: we consistently outperform lightgbm, xgboost, random forests, linear models, etc. using neural networks even on smaller datasets. This applies whether we implemented the other algorithms ourselves or simply compared to someone else’s results with them. In my experience it really comes down to how many “tricks” you know for each algorithm and how well can you apply and combine these “tricks”. The difference is that neural networks have many more of these tricks and a broader coverage of research detailing the interactions between them.
I call them “tricks” but really they’re just design decisions based on what current research indicates about certain problems. This is largely where the “art” part of neural networks comes from that many people refer. The search space is simply too big to try everything and hope for the best. Therefore, how a problem is approached and how solutions are narrowed and applied really matter. Even simple things like which optimizer you use, how you leverage learning rate schedules, how the loss function is formulated, how weights are updated, feature engineering (often neglected in neural networks), and architectural priors make a big difference on both sample efficiency and overall performance. Most people, if they’re not just fine-tuning an existing model, simply load up a neural network framework, stack some layers together and throw data at it expecting better results than other approaches. But there’s a huge spectrum from that naive approach to architecting a custom model.
This is why neural networks are so powerful and why we tend to favor it (though not for every problem). It’s much easier to design a model from the ground up with neural networks than it is for e.g. xgboost because not only are the components more easily composable thanks to the available frameworks but there’s a ton more research on the specific interactions between those components.
That doesn’t mean than every problem is appropriate for neural networks. I completely agree with you that no matter what the problem is you should never jump to an approach just because its popular. Neural networks are a tool and for many problems you need to be comfortable with every one of those decision points to get the best results and even if you’re comfortable it can take time and that isn’t always appropriate for every problem. My other point is that I wouldn’t draw too many conclusions about a particular algorithm being better or worse than another. I’m not saying that was the intention with your comment but I know many people in the ML industry tend to take a similar position. It really depends on current experience with the applied algorithms, not just experience with ML in general.
LightGBM occupies a sweet spot between speed and accuracy, and is a library I've grown to love. Nowadays, this is my primary choice for quick impactful results. NB: if your data has categorical features, you might easily beat xgboost in training time, since LightGBM explicitly supports them, and for xgboost you would need to use one hot encoding, increasing the size of the data that the library needs to work with.
I think tensorflow dominates industry purely because of its capability of exporting the model into a coreml Android model or easy of moving it to production in a GCP environment or in whatever form. Pytorch might have to build a good production pipeline around it to catch up in this game.
With fastai module that's built on Pytorch learning and developing Deep Learning solutions have become a lot easier. So there's a real game on now
And simply TensorFlow was there earlier, so people implemented stuff in it. I think there's more inertia in industry, whereas researchers may more easily switch frameworks between two papers.
researchers are like Francois Chollet & many more even jeff dean and it is the greatness of the community that it's pretty open to optimization "Why do researchers love PyTorch?
Simplicity. It’s similar to numpy, very pythonic, and integrates easily with the rest of the Python ecosystem. For example, you can simply throw in a pdb breakpoint anywhere into your PyTorch model and it’ll work. In TensorFlow, debugging the model requires an active session and ends up being much trickier.
Great API. Most researchers prefer PyTorch’s API to TensorFlow’s API. This is partially because PyTorch is better designed and partially because TensorFlow has handicapped itself by switching APIs so many times (e.g. ‘layers’ -> ‘slim’ -> ‘estimators’ -> ‘tf.keras’).
Performance. Despite the fact that PyTorch’s dynamic graphs give strictly less opportunity for optimization, there have been many anecdotal reports that PyTorch is as fast if not faster than TensorFlow. It's not clear if this is really true, but at the very least, TensorFlow hasn't gained a decisive advantage in this area."
common people do want to appreciate & adopt the things that seems fit to the knowledge sphere at present from the researchers. tensorflow approaches are better and respected each and everyone of the community as well in exchange enlightened us new ways of understanding of ml solutions. it have turned into a family “If you want to go fast, go alone. If you want to go far, go together.” and given the assets alphabet have a common man can turn into researchers! e.g. >> https://learn.grasshopper.app take this for example "Learn to code anywhere.
Grasshopper is available on iOS, Android, and all web browsers. Your progress syncs seamlessly between devices." << this is the status quo ! it's a gift of a lifetime for generations !
I'm using Keras from last 3 years. Most of the time where I have to deal with core TF code is when I have to write some custom layers. I totally agree on a part where hacking together TF code seems nightmare (well, initially.. but not once you know what you're doing), where PyTorch more looks like blissful experience (I have not tried PyT yet, just speaking from reading all these comments). I'm genuinely curious about how one can use the trained PyTorch models in production? For example, I got 6 TF based translation models + 1 classification model running on single AWS instance with TensorFlow Serving with 1 GPU and 8 CPU cores. These 7 models are deployed to take advantage of all the resources of this instance and everything runs smoothly. Now considering I got these same models in PyTorch, what are my options to do the same?
What challenges are you worried about with transferring PyTorch to production? It’s been wonderful to work with, but I haven’t put a PyTorch model in high volume production yet, so I’m curious too.
It's only a matter of time until PyTorch will also dominate industry.
It's always like this.
Think how Ubuntu took over the server market because amateurs were preferring it instead of Redhat/CentOS. And when they became professionals or were in a position to decide, they also put Ubuntu on the server because this is what they knew best.
I'm not sure that's a great example, given that AWS mostly runs on RHEL-based OSs and Debian is still preferred for Docker. Ubuntu did not "take over the server market".
I really enjoyed this write up. Thank you for putting it together. Even as a TF user, I feel it's a really fair assessment of TF vs PyTorch.
A quick observation that may not be 100% accurate but still worth mentioning: in some ways TF feels like it was written to solve large scale issues on day one. For example, when I started playing with the new TF 2.0 distribution strategies and dataset pipeline, I quickly got the sense that this thing was meant to move and ingest bucketloads of data across hundreds/thousands of vm instances. In a way, I suppose it's a reflection of Google culture where there's a strong emphasis on not doing things that don't scale to Google Scale.
As a result of this, I sort of feel that you should start with PyTorch and eventually graduate to TF if/when the scale requires it. This is sort of like starting with Rails/Django/Node, and migrating to a Go/JVM/[Insert Your Favorite Static Language Here] stack when the traffic load warrants it.
Whatever happened to Julia? Wasn't it supposed to incorporate all these incredible abstractions at the language level and run quickly on GPUs and everything in-between? Is it just lack of adoption or is has it something else?
If you mean Zygote.jl, it's a very ambitious project (like Swift for Tensorflow which has been under development for even longer I believe) with not many people working on it compared to Tensorflow and pytorch. And Pytorch for example only supports the method it decides to overload, while Zygote aims to support everything in the language (including stuff that isn't as obvious like state, IO, control flow in general). And then you have optimizations over the computation graph, memory management on GPU and many corner cases I can't imagine.
Though you can already use very clean Pytorch style libraries like Flux and Knet or the Tensorflow bindings to leverage the benefits of Julia for high performance numerical processing on the adjacent tasks such as data preprocessing.
Coming from using Tensorflow in industry, I recently played with Flux at home. Language level support for AD should be a game changer, but it's a hard transition mentally. You have to understand one language deeply rather than two languages shallowly. I found myself bogged down solving lispy puzzles involving functions composing other functions. In tf (and most AD frameworks) you churn out some ugly procedural code in an ergonomic language that generates some ugly pure functional code in a more limited language (the computational graph). Different cognitive overheads. Julia hasn't been 1.0 for very long; it may still take off.
Essentially, the biggest advantage imo is that Julia offers a single cohesive language, where compilers can do anything at the language level. I don't think this will allow for a single killer application - almost anything Julia can do ca n be simulated by some combination of Python/C++.
However, what might be true is that using a single language allows for much faster development and iteration than a combination of Python/C++. I think the way that'll manifest is in more and more high quality libraries coming out for Julia that are higher quality than the Python ones.
I've been using Julia's Flux, it's great for when you have some arbitrary model you want to run gradient descent on that isn't just a bunch of matrix ops, as the framework overhead is way less than TF or PyTorch due to Julia being 100x faster than pure Python.
In my experience in computer vision research it doesn’t matter what you use, yes, immediate mode is slightly more convenient, but research time is influenced much more by your computing power, dataset acqusition/labeling/relabeling power and, last but not the least, by code quality and easy and efficient collaboration - thats why you need tools like DVC/Argoproj. We did get amazing results using Caffe v1 back in the day.
I agree. I haven’t encountered a strong preference in academic computer vision or machine learning. Keras and PyTorch, dominate, of course, but I wouldn’t be shocked if everyone started using something new in the future.
Anyone has any opinions on TF2.0? They've released it recently, and it seems like it should be much closer to PyTorch now, but I don't know enough to evaluate it properly.
TF2.0 (and in particular their recommended tf.keras) is simply a clone of the Pytorch API in most respects. There is no reason to use it vs just using Pytorch, especially as Pytorch now support easy model exporting for running in production.
Like others here, at work we switched over from TensorFlow to PyTorch when 1.0 was released, both for R&D and production. Our productivity and happiness with PyTorch are noticeably, significantly better.
Back when we were using TensorFlow, whenever we wanted to try something new, sooner or later we would find ourselves wrestling with its computational graph abstraction, which is non-intuitive, especially for models with more complex control flow.
That said, we are keeping an eye on Swift + MLIR + TensorFlow. We think it could unseat PyTorch for R&D and eventually, production, due to (a) the promise of automatic creation of high-performance GPU/TPU kernels without hassle, (b) Swift's easy learning curve, and (c) Swift's fast performance and type safety. Jeremy Howard has a good post about this: https://www.fast.ai/2019/03/06/fastai-swift/
I remember using early Torch (in Lua! As someone who knew only Matlab!) in 2015-ish; and then using Keras (which is supposed to be an abstraction layer over NN frameworks) and finding it much more verbose and complicated to use without recurring to code snippets.
Perhaps it’s the nature of the game that changed with many new kinds of architectures and so on. But maybe Keras is already overengineered for someone who just wants to make thumbnail sized GAN stuff at home.
With the release of pytorch mobile, people building products will want to use stuff from the Pytorch universe, while researchers who just want to prototype an idea and want a numpy like accelerated interface will look at jax.
I talk about Jax in the article. It's very cool, especially if you need higher order derivatives. However, it's not meant to be a full neural network library, and unless Google invests significantly into it, it won't take off significantly imo.
> Great API. Most researchers prefer PyTorch’s API to TensorFlow’s API. This is partially because PyTorch is better designed and partially because TensorFlow has handicapped itself by switching APIs so many times (e.g. ‘layers’ -> ‘slim’ -> ‘estimators’ -> ‘tf.keras’).
Arguably, one of the biggest issues Google had with Angular was the switch from 1.x to 2.x. You'd have thought they learned about how not to make major changes on OSS projects.
Facebook on React for instance do an amazing job here, they use prefixes to anything they don't want to support like "UNSTABLE_" and show warnings forever when they actually plan to make something small obsolete.
I tried to learn from both, so in some of my bigger personal OSS projects (amount of work involved) like npm's "server". I purposefully made some APIs a bit more limited than I could to have more flexibility later on if I didn't like the direction. Of course at a different level, I am a single dev doing OSS on my free time after all.
But I understand in a project of the size of e.g. Tensorflow it's not an individual dev learning, it's more about the company learning how to do things better.
MXNet is actually pretty good. It got to the "mixing eager and graph mode" semantics before either PyTorch or TensorFlow did. On top of that, it's also blazing fast (usually the fastest of the frameworks).
Admittedly, I've never used MXNet so it might have more issues that I'm not aware of. Judging from the benchmarks I've seen, however, MXNet got a lot of things right.
Unluckily, I just don't think it added enough on top of PyTorch or TensorFlow for people to consider switching. People switched from TensorFlow to PyTorch because eager mode was just so much easier to use.
My former employer pulled in a few AWS data scientists to consult with us on a few projects and based on my interactions it seemed like they were under some directive to strongly discourage anything that wasn't a built-in AWS plug-and-play sagemaker algorithm. It was not a positive experience because of course most of them are half baked.
MxNet is fantastic though. It's usually faster than tensorflow, has a pytorch like "eager" API that doesn't suck, and can still use symbolic graphs. Amazing documentation too (for the Gluon API).
We have been using mostly Dlib[0]. There was the need to develop solutions that can be statically compiled and produce dependency-free dlls and dlib delivered remarkbly on that aspect.
I haven't had success doing so using frameworks such as Torch and TF, even if their toolkit is better to develop new solutions.
Also we get to write code in C++, which can be a big positive when developing machine learning SDKs. I personally still do most of the prototyping in Python though.
I'll be checking the link on the post that mentions that pytorch allows models to be converted to c++, looks promising actually.
That first graph is super-confusing. The Y axis says "Percentage of unique mentions" but it only goes up to 0.7%? Was it meant to be "Fraction of unique mentions".
And then the title is "PyTorch vs Tensorflow", but it never says whether the Y axis is unique mentions of PyTorch or Tensorflow? From the context I guess PyTorch, but come on!
The Y axis should be "Fraction mentioning PyTorch", and the title should be "Papers that only mention PyTorch or Tensorflow" (assuming I have understood this correctly).
Shame it was labelled so badly because it's an amazing graph otherwise!
My very biased opinion: you start with PyTorch because it's easy to develop and debug, and there's no point in having the fastest tools for a model that you can't train properly.
Once your model is running, and if/when you start hitting performance bottlenecks, then you consider migrating your model to TensorFlow.
One thing the AI/data scene gets the best of is data on their own industry. Reminds me of how Ruby used to have the best designed websites for their various tools.
I don't even believe the thesis about one dominating the other in a specific domain. I dont think top mentions in conferences in a good measure of usage.
Ok, admittedly, there are a couple reasons. The fact that most papers don't mention the framework they use is a big one. So if users of one framework disproportionately mentioned that framework in their paper, it would be overrepresented.
Basically, some conferences have encouraged researchers to submit code. Instead of checking the papers, I checked their code instead. The results are pretty much the same. So I think that mentions in top conferences probably correlates well with uses in code.
While PyTorch is awesome, one thing it suffers from in my opinion is no "one way" to do things - I've found it difficult to take someones model and training code and tweak it so it fits in your code, compared to Chainer that has nice abstractions for a trainer, updaters, models, etc.
PyTorch is easy to use and modify, but Chainer, and by extension cupy (a separate awesome project!) are really, really easy to work with.
For the majority of production use cases (which tend to get all the AI/ML hype), TensorFlow/Keras is more than powerful enough and accessible enough. If you need to dive down to custom layers/optimizers, PyTorch has value there, but for people looking to get their start in AI/ML, the meme that "TensorFlow sucks" is highly misleading.
I thought the article was a good read and compared the two frameworks with only small hints of personal bias, but one point about industry changing to use pytorch because of researchers already knowing it seems like wishful thinking. Unless PyTorch addresses its mobile and serving issues it is simply not a great choice for many production situations. This article actually influenced me to stick with TF instead of learning PyTorch due to my industry needs.
Additionally I think tensorflow opt in by default for eager execution is fine maybe good even.
Many models are relatively simple and I doubt the gains for rewriting them to utilize the execution graph will be worth it when with the keras frontend you can just dump the h5py model and run it from there which many companies already do.
Rewriting will only be an issue for sufficiently complex models and at that point I imagine competent ML professionals will have baked the time for that into the estimate of the engineering costs.
I doubt this and feel the conclusion might be just the opposite, that TF 2 will be the top choice for most developers. Just started learning TF2 and feel it's indeed a great upgrade. Still new to this, and I need TF2 for products instead of research, the tensorflow lite and tensorflow.js seems very useful, plus tensorboard looks promising as well.
its been my observation that most researchers/DS prefer PyTorch because it lets them hack in python and most production software engineers will prefer models be written TF because of effortless portability and performance of TF Graphs.
I work on a team that does the latter and lately DS have been handing off PyTorch models that we cant scale or make performant because Torchscript doesnt really work with any realistic code complexity and authors include all sorts of random python libraries. So we can't load models in C++ or get them under 50ms.
So the framework divide very much feels like dynamic vs statically typed languages. People that dont have real production demands love dynamic languages for the productivity.
Thank you very much for writing this up. I have been using TensorFlow since it was first released and even though I am now retired I have been looking at my own open source models with an idea of converting to TF 2.
Since I am just keeping up with deep learning in particular and AI in general for my own interests, I will likely switch over to PyTorch because there is no risk involved and learning something new is fun. This is a big change since I have years of TF experience and perhaps four or five evenings spent with PyTorch.
I think its matter of time. New things gets adopted first in research. I think pytorch will take over tensorflow. I was also a tensorflow user and when i switched to pytorch i never looked back. I was also participating in a kaggle competition and top 20 models are all implemented in pytorch.
This was for Computer Vision and NLP conferences, but would the same be true if AutoML were thrown into the mix? I care mostly about efficiency and optimization, and the author wasn't able to distinguish that Pytorch is any better or worse save the two anecdotes.
To be fair I think Matlab is still heavily used in industry, which I view as a direct result of being so dominant among students and researchers. Maybe not for machine learning, but for controls engineering, signal processing, stuff like that it feels unavoidable.
Personally I think all these deep learning frameworks just haven't had as much time to mature, I have a feeling once they do that the one that dominates academia will eventually dominate industry.
Going from doing ML research to data ingestion and analysis to web frameworks to API design, blogging and static site generation is very powerful. The trend seems to be that python will dominate all of these .
Can someone provide a tldr on the differences? I know enough to implement models in tensorflow (via keras) and have a decent understanding of parameter tweaking, but really don't understand the fundamental difference between these two libraries. Thanks!
Keras still is the very best in terms of expressive models and end to end workflows. It leans heavily on the design idea that you should deliberately design for end to end use cases and all intermediate abstractions should be building blocks that serve precisely that purpose. This is discussed in [0] which IMO is something that deserves to be more widely talked about in software engineering. Lots of other disciplines of software engineering _say_ you should design this way, but in my experience it’s very rare no matter what discipline you’re in. Take TensorFlow itself. It’s a huge mess with no clear abstractions useful for end to end solutions. Just a hodge podge of disparate APIs and way too many underlying engineering concepts were elevated to abstractions for engineer (instead of user) convenience.
Constraining design by end to end use cases is a remarkably robust and useful process.
PyTorch is way better at having clean engineering abstractions than TensorFlow, but still falls short when things like “forward” or maintaining your own training loop and gradient metadata are necessary concepts for a practitioner’s end to end workflow.
In terms of model expressiveness, I made a functional NN building API for PyTorch (just like keras'), which offers the optimal balance of flexibility and expressiveness:
https://github.com/blue-season/pywarm
PyTorch has become dominant in research because of its API (both its stability + having eager mode).
TF has become dominant in industry because A. it came out several years before PyTorch and industry is slow to move, B. It supported a lot of production use cases (mobile, serving, removing Python overhead) that PyTorch didn't for a long time.
Researcher's codes are historically not very clean and re-usable. It may work fine if you want to hack together something to get data for a paper, but if you want to run a real production service, and don't want to drown in tech debt in a year; and that often means more code, as you imply above. I don't think it's unnecessarily verbose, it's just that it's more structured and scalable.
PyTorch is simpler, easer to use, consumes less memory and allows for dynamic dynamic computational graphs (dynamic operations during the forward pass).
rtkaratekid|6 years ago
Despite the absolute nightmare of getting it installed and running on a gpu, I managed it and had a fantastic model. It was doing so well that the company wanted to expand the project and build out a multi-gpu rig as part of it. So I get building that environment and install the latest CUDA, cuDNN, nvidia driver and use tensorflow 2.0 aaaaaand it wouldn't work. I actually spent a long time hacking on it till on a forum I read that it was just a bug that hadn't been fixed yet.
At this point I decided to see what Pytorch was like. In literally one day I installed everything and migrated my project completely over to pytorch. Same speed, same accuracy, works perfectly on a multi-gpu rig when I set it to. It was like a breath of fresh air.
The next day I wrote some C++ to import a saved pytorch model so it could run in a deployment environment. The C++ api is also great. The docs are lacking a little bit, but an Facebook researcher mentioned to me on the forums that they're hoping to have it all done by next month.
It's unlikely that I'll be going back to tensorflow.
bigred100|6 years ago
nextos|6 years ago
ackbar03|6 years ago
p1esk|6 years ago
eliashaddad|6 years ago
dchichkov|6 years ago
Because of that, it doesn't make much sense to judge a "differential programming language" like TensorFlow or PyTorch by the ease of installation. It'd be like saying "I prefer C# over C++" because it is easier to install.
_coveredInBees|6 years ago
1. Extremely easy to debug and work with. Being able to debug effortlessly in PyCharm makes life very easy.
2. The API is quite clean and nice and fits in really well with Python and nothing feels hacky. I've developed my own Keras-like framework for experimentation, training and evaluating models quickly and easily and the entire experience has been really enjoyable.
3. The nicest thing though is that as the article points out, a huge percentage of researchers have moved to Pytorch and this allows us to more easily look at other researcher's code and experiment with things easily and incorporate ideas and cutting-edge research into our own work. Even for things that are released in TensorFlow, if it is an important publication that gains attention and traction in the community, you will likely have implementations in Pytorch pop up soon enough.
I do think that TensorFlow still has an edge on the deployment at scale/mobile side of things as pointed out by the article. But Pytorch is a lot younger and they are making a lot of progress with every release in that space.
UncleOxidant|6 years ago
I'll add that it was much easier to install PyTorch with GPU support than it was to install TensorFlow with GPU support - at least that's how it was around November of last year. The PyTorch install was painless, whereas we ended up having to build TF from source to work with out setup. Could be different now as I haven't looked at TF since then.
ChefboyOG|6 years ago
mrfusion|6 years ago
arugulum|6 years ago
There're maybe all of two "surprises" I've encountered in all my time using it, if even (1. Gradients are accumulated in state, 2. nn.Module does funky things with attributes, so use something like nn.ModuleDict if you're going to be dynamically setting modules). Everything else works like a dream, and works almost exactly how you expected.
Model parameters? .parameters() gives you a dict-friendly generator of tensors. Model state? .state_dict() is a dictionary. Loading model state? load_state_dict(state_dict)... just loads a dictionary. Reusing modules across different modules? Just assign them! Determining what parameters to optimize? Just ... give the list of parameters to the optimizer.
You can use all your using Python development and debugging tools, and it feels 100% natural. I can fit it into other Python workflows without making the whole program centered around TensorFlow.
TensorFlow is undoubtedly powerful, and if you have the time/resources to put into a static-ish TensorFlow-centric workflow, it could pay off many times over. But it definitely feels like learning an entirely new language, with an entirely different debugging pattern. And furthermore, a language that is constantly changing patterns and best practices, other than super-standard Keras examples.
To put into context, even running the official TensorFlow models repository has deprecation warnings. Whereas torchvision works like seamlessly and reads like a reference for writing PyTorch model code.
There is just a developer-centric focus to PyTorch that makes it a joy to use.
_coveredInBees|6 years ago
I was able to create a custom detection network for a 3-class problem, load up the COCO pretrained weights for the network, strip out all the other weights at the "head" for all the other COCO classes except for the "person" class and then fine-tune the model on my custom 3-class dataset. The resulting model generalized exceptionally well on people as it was still able to retain a lot of its performance from the COCO pre-training. It was so easy to do all of this. Literally, maybe 10 lines of code, and so easy to figure out since I could introspect the state_dict and the weights file directly in my PyCharm interpreter while working out how to do this.
microtonal|6 years ago
I am a bit saddened by all of this, because I really liked how easy it is to define a graph in Tensorflow in Python, serialize it, and then use its minimalistic C API to use the graph in Go, Rust, or wherever you need it.
How is your experience with PyTorch and backwards API compatibility (I know that they only reached 1.0 fairly recently)?
arugulum|6 years ago
Other than that, I've had next to no issues, and the API has only gotten better over time, with more convenient ways to do things.
PyTorch has a much smaller footprint, and is happy to delegate code to separate libraries (e.g. torchvision), so you run into "all-or-nothing" dilemmas less frequently.
_coveredInBees|6 years ago
option|6 years ago
For it’s successor we chose Pytorch instead of TF 2 and have been very happy with this decision
king_magic|6 years ago
tedivm|6 years ago
oli5679|6 years ago
Lightgbm library has consistently performed well. I've been interested in how many colleagues instantly jump to neural nets when in my experience this often doesn't beat lightgbm on medium sized datasets not related to text/images.
suresk|6 years ago
One area where I wonder if neural nets would be a more useful option is using something like an LSTM to predict defaults based on a sequence of data? I've tried this a handful of times and doing a bit of feature engineering to aggregate data in a handful of fixed buckets has usually been better and easier, but I'm far from an expert in that area.
I know Jeremy Howard has shown decent results with fastai/pytorch for tabular data and I've seen some Kaggle teams do well with neural nets for tabular data. I've also had decent results with gbdt/nn ensembles. But I think in most situations where you just have tabular data, you'll get better results with less effort if you use lightgbm or the like.
jimfleming|6 years ago
I call them “tricks” but really they’re just design decisions based on what current research indicates about certain problems. This is largely where the “art” part of neural networks comes from that many people refer. The search space is simply too big to try everything and hope for the best. Therefore, how a problem is approached and how solutions are narrowed and applied really matter. Even simple things like which optimizer you use, how you leverage learning rate schedules, how the loss function is formulated, how weights are updated, feature engineering (often neglected in neural networks), and architectural priors make a big difference on both sample efficiency and overall performance. Most people, if they’re not just fine-tuning an existing model, simply load up a neural network framework, stack some layers together and throw data at it expecting better results than other approaches. But there’s a huge spectrum from that naive approach to architecting a custom model.
This is why neural networks are so powerful and why we tend to favor it (though not for every problem). It’s much easier to design a model from the ground up with neural networks than it is for e.g. xgboost because not only are the components more easily composable thanks to the available frameworks but there’s a ton more research on the specific interactions between those components.
That doesn’t mean than every problem is appropriate for neural networks. I completely agree with you that no matter what the problem is you should never jump to an approach just because its popular. Neural networks are a tool and for many problems you need to be comfortable with every one of those decision points to get the best results and even if you’re comfortable it can take time and that isn’t always appropriate for every problem. My other point is that I wouldn’t draw too many conclusions about a particular algorithm being better or worse than another. I’m not saying that was the intention with your comment but I know many people in the ML industry tend to take a similar position. It really depends on current experience with the applied algorithms, not just experience with ML in general.
abhgh|6 years ago
hobofromabroad|6 years ago
And if you have any kind of seasonality you a dataset with a large enough timeframe. (At least more than a year.)
Nonetheless, LightGBM and xgboost are also commonly used in the insurance sector.
They are still somewhat problematic for conversion rates in a highly dynamic market though.
elmalto|6 years ago
acgan|6 years ago
Code: https://github.com/Chillee/pytorch-vs-tensorflow
Ablation of claims: https://thegradient.pub/p/cef6dd26-f952-4265-a2bc-f8bfb9eb1e...
JS interactive charts: https://chillee.github.io/pytorch-vs-tensorflow/
ankeshanand|6 years ago
amrrs|6 years ago
With fastai module that's built on Pytorch learning and developing Deep Learning solutions have become a lot easier. So there's a real game on now
bonoboTP|6 years ago
pillefitz|6 years ago
hhtoyou|6 years ago
common people do want to appreciate & adopt the things that seems fit to the knowledge sphere at present from the researchers. tensorflow approaches are better and respected each and everyone of the community as well in exchange enlightened us new ways of understanding of ml solutions. it have turned into a family “If you want to go fast, go alone. If you want to go far, go together.” and given the assets alphabet have a common man can turn into researchers! e.g. >> https://learn.grasshopper.app take this for example "Learn to code anywhere. Grasshopper is available on iOS, Android, and all web browsers. Your progress syncs seamlessly between devices." << this is the status quo ! it's a gift of a lifetime for generations !
snendroid-ai|6 years ago
ericd|6 years ago
onlyyimte|6 years ago
It's always like this.
Think how Ubuntu took over the server market because amateurs were preferring it instead of Redhat/CentOS. And when they became professionals or were in a position to decide, they also put Ubuntu on the server because this is what they knew best.
barbecue_sauce|6 years ago
jniedrauer|6 years ago
__sy__|6 years ago
A quick observation that may not be 100% accurate but still worth mentioning: in some ways TF feels like it was written to solve large scale issues on day one. For example, when I started playing with the new TF 2.0 distribution strategies and dataset pipeline, I quickly got the sense that this thing was meant to move and ingest bucketloads of data across hundreds/thousands of vm instances. In a way, I suppose it's a reflection of Google culture where there's a strong emphasis on not doing things that don't scale to Google Scale.
As a result of this, I sort of feel that you should start with PyTorch and eventually graduate to TF if/when the scale requires it. This is sort of like starting with Rails/Django/Node, and migrating to a Go/JVM/[Insert Your Favorite Static Language Here] stack when the traffic load warrants it.
kcolford|6 years ago
ddragon|6 years ago
Though you can already use very clean Pytorch style libraries like Flux and Knet or the Tensorflow bindings to leverage the benefits of Julia for high performance numerical processing on the adjacent tasks such as data preprocessing.
drewm1980|6 years ago
chillee|6 years ago
You can take a look at https://discourse.julialang.org/t/where-does-julia-provide-t... for some of my questions.
Essentially, the biggest advantage imo is that Julia offers a single cohesive language, where compilers can do anything at the language level. I don't think this will allow for a single killer application - almost anything Julia can do ca n be simulated by some combination of Python/C++.
However, what might be true is that using a single language allows for much faster development and iteration than a combination of Python/C++. I think the way that'll manifest is in more and more high quality libraries coming out for Julia that are higher quality than the Python ones.
Maybe wait 5 years, and we'll see what happens :)
logicchains|6 years ago
samcodes|6 years ago
xvilka|6 years ago
[1] https://github.com/FluxML/Flux.jl/issues/625
fspeech|6 years ago
Dzugaru|6 years ago
Q6T46nT668w6i3m|6 years ago
rayalez|6 years ago
https://www.youtube.com/watch?v=EqWsPO8DVXk
sails|6 years ago
[0] https://www.fast.ai/about/ [1] https://www.youtube.com/watch?v=J6XcP4JOHmk&t=4152s
lalaland1125|6 years ago
phillipcarter|6 years ago
* Automatic differentiation of higher-order differentiation being important, and how there's clearly room to disrupt there
* Increasing hardware diversity seems to mean that both frameworks will run into a brick wall as-is
Exciting space. It'll be fascinating to see how dramatically, or not, things change in the coming years.
cs702|6 years ago
Back when we were using TensorFlow, whenever we wanted to try something new, sooner or later we would find ourselves wrestling with its computational graph abstraction, which is non-intuitive, especially for models with more complex control flow.
That said, we are keeping an eye on Swift + MLIR + TensorFlow. We think it could unseat PyTorch for R&D and eventually, production, due to (a) the promise of automatic creation of high-performance GPU/TPU kernels without hassle, (b) Swift's easy learning curve, and (c) Swift's fast performance and type safety. Jeremy Howard has a good post about this: https://www.fast.ai/2019/03/06/fastai-swift/
jeffshek|6 years ago
It feels a bit too early to tell. I don't believe many researchers will switch to Swift though.
thanatropism|6 years ago
Perhaps it’s the nature of the game that changed with many new kinds of architectures and so on. But maybe Keras is already overengineered for someone who just wants to make thumbnail sized GAN stuff at home.
nmca|6 years ago
mcbuilder|6 years ago
chillee|6 years ago
mlevental|6 years ago
franciscop|6 years ago
> Great API. Most researchers prefer PyTorch’s API to TensorFlow’s API. This is partially because PyTorch is better designed and partially because TensorFlow has handicapped itself by switching APIs so many times (e.g. ‘layers’ -> ‘slim’ -> ‘estimators’ -> ‘tf.keras’).
Arguably, one of the biggest issues Google had with Angular was the switch from 1.x to 2.x. You'd have thought they learned about how not to make major changes on OSS projects.
Facebook on React for instance do an amazing job here, they use prefixes to anything they don't want to support like "UNSTABLE_" and show warnings forever when they actually plan to make something small obsolete.
I tried to learn from both, so in some of my bigger personal OSS projects (amount of work involved) like npm's "server". I purposefully made some APIs a bit more limited than I could to have more flexibility later on if I didn't like the direction. Of course at a different level, I am a single dev doing OSS on my free time after all.
But I understand in a project of the size of e.g. Tensorflow it's not an individual dev learning, it's more about the company learning how to do things better.
chips2001|6 years ago
elwell|6 years ago
o10449366|6 years ago
chillee|6 years ago
Admittedly, I've never used MXNet so it might have more issues that I'm not aware of. Judging from the benchmarks I've seen, however, MXNet got a lot of things right.
Unluckily, I just don't think it added enough on top of PyTorch or TensorFlow for people to consider switching. People switched from TensorFlow to PyTorch because eager mode was just so much easier to use.
theferalrobot|6 years ago
alfalfasprout|6 years ago
rossdavidh|6 years ago
pmiller2|6 years ago
TickleSteve|6 years ago
nbeleski|6 years ago
I haven't had success doing so using frameworks such as Torch and TF, even if their toolkit is better to develop new solutions.
Also we get to write code in C++, which can be a big positive when developing machine learning SDKs. I personally still do most of the prototyping in Python though.
I'll be checking the link on the post that mentions that pytorch allows models to be converted to c++, looks promising actually.
[0] http://dlib.net/
IshKebab|6 years ago
And then the title is "PyTorch vs Tensorflow", but it never says whether the Y axis is unique mentions of PyTorch or Tensorflow? From the context I guess PyTorch, but come on!
The Y axis should be "Fraction mentioning PyTorch", and the title should be "Papers that only mention PyTorch or Tensorflow" (assuming I have understood this correctly).
Shame it was labelled so badly because it's an amazing graph otherwise!
chillee|6 years ago
I fixed these properly at some point, but I made some last minute modifications to the text size and such.
These interactive figures are probably a bit better overall too: https://chillee.github.io/pytorch-vs-tensorflow/
I'll change that ASAP. Thanks for the heads up!
EDIT: Fixed! Lemme know if that addressed your issues.
stefan_|6 years ago
p-morais|6 years ago
unknown|6 years ago
[deleted]
swampthinker|6 years ago
probably_wrong|6 years ago
Once your model is running, and if/when you start hitting performance bottlenecks, then you consider migrating your model to TensorFlow.
hprotagonist|6 years ago
0-_-0|6 years ago
dmix|6 years ago
Der_Einzige|6 years ago
chillee|6 years ago
Ok, admittedly, there are a couple reasons. The fact that most papers don't mention the framework they use is a big one. So if users of one framework disproportionately mentioned that framework in their paper, it would be overrepresented.
I did cover this concern though, in the Appendix. Check out the "Biased Sample" section.(https://thegradient.pub/p/cef6dd26-f952-4265-a2bc-f8bfb9eb1e...)
Basically, some conferences have encouraged researchers to submit code. Instead of checking the papers, I checked their code instead. The results are pretty much the same. So I think that mentions in top conferences probably correlates well with uses in code.
buildbot|6 years ago
PyTorch is easy to use and modify, but Chainer, and by extension cupy (a separate awesome project!) are really, really easy to work with.
minimaxir|6 years ago
Grimm1|6 years ago
Additionally I think tensorflow opt in by default for eager execution is fine maybe good even. Many models are relatively simple and I doubt the gains for rewriting them to utilize the execution graph will be worth it when with the keras frontend you can just dump the h5py model and run it from there which many companies already do.
Rewriting will only be an issue for sufficiently complex models and at that point I imagine competent ML professionals will have baked the time for that into the estimate of the engineering costs.
ausjke|6 years ago
coderheed|6 years ago
[0] https://trends.google.com/trends/explore?date=2017-01-11%202...
33MHz-i486|6 years ago
I work on a team that does the latter and lately DS have been handing off PyTorch models that we cant scale or make performant because Torchscript doesnt really work with any realistic code complexity and authors include all sorts of random python libraries. So we can't load models in C++ or get them under 50ms.
So the framework divide very much feels like dynamic vs statically typed languages. People that dont have real production demands love dynamic languages for the productivity.
haolez|6 years ago
I’m working in a Go code base and I’m thinking of using it instead of creating a separate service in Python.
https://gorgonia.org/
woah|6 years ago
chewxy|6 years ago
I definitely prefer using it to deploy services than PyTorch, MXNet or TF
mark_l_watson|6 years ago
Since I am just keeping up with deep learning in particular and AI in general for my own interests, I will likely switch over to PyTorch because there is no risk involved and learning something new is fun. This is a big change since I have years of TF experience and perhaps four or five evenings spent with PyTorch.
krastanov|6 years ago
pytorch bug tracker: https://github.com/pytorch/pytorch/issues/755
sairahul82|6 years ago
goliathDown|6 years ago
patagurbon|6 years ago
I much prefer PyTorch, effectively all graph frameworks are there. Very nice to see TPU support with 1.3 as well.
anirudhgarg|6 years ago
lettergram|6 years ago
My point is researchers using a framework (matlab) does not mean it’s used heavily in industries or even all industries.
sgillen|6 years ago
Personally I think all these deep learning frameworks just haven't had as much time to mature, I have a feeling once they do that the one that dominates academia will eventually dominate industry.
zitterbewegung|6 years ago
Dude2029|6 years ago
chadmeister|6 years ago
https://news.ycombinator.com/item?id=21217169
unknown|6 years ago
[deleted]
gryffin|6 years ago
Or even Karma & Jest,
Facebook seems to be late to market, but learns from Google's mistakes, to create simpler and more elegant tools.
chewxy|6 years ago
clatan|6 years ago
boringg|6 years ago
ineedasername|6 years ago
unknown|6 years ago
[deleted]
mlthoughts2018|6 years ago
Constraining design by end to end use cases is a remarkably robust and useful process.
PyTorch is way better at having clean engineering abstractions than TensorFlow, but still falls short when things like “forward” or maintaining your own training loop and gradient metadata are necessary concepts for a practitioner’s end to end workflow.
[0]: https://blog.keras.io/user-experience-design-for-apis.html
abakus|6 years ago
eanzenberg|6 years ago
chillee|6 years ago
https://thegradient.pub/p/cef6dd26-f952-4265-a2bc-f8bfb9eb1e...
mruts|6 years ago
[deleted]
xtat|6 years ago
[deleted]
qwerty456127|6 years ago
chillee|6 years ago
As a summary, though:
PyTorch has become dominant in research because of its API (both its stability + having eager mode).
TF has become dominant in industry because A. it came out several years before PyTorch and industry is slow to move, B. It supported a lot of production use cases (mobile, serving, removing Python overhead) that PyTorch didn't for a long time.
nknealk|6 years ago
The Keras interface for tensorflow makes it easy & fast to make "good enough" models. That is often a driving factor
hprotagonist|6 years ago
industry: tensorRT.
machinelearning|6 years ago
- More code to check-in (Looks more productive)
- More infrastructure, e.g. checkpoints, exporters etc. (Looks like they're doing more work)
- Fancy visualizations (Allows them to look impressive while presenting loss plots)
- Easier to reuse things others have implemented and still get credit for it (TF model zoo, research repo etc.)
Why researchers like pytorch:
- Way easier to hack together their novel idea
- Looks scrappier (which somehow makes the individual look like a better researcher instead of an ordinary programmer)
- Lots of other researchers release code in pytorch so if you're working off of their idea, you use pytorch to avoid re-producing their results.
Open to debate on these ideas, let me know if you have a counterpoint or any other reasons to add
stinos|6 years ago
With those bullet points, looks like you didn't talk to actual engineers, but rather middle-layer management people.
mlthoughts2018|6 years ago
Not even bad engineers try to pretend like this is true.
ehsankia|6 years ago
andbberger|6 years ago
Why I use tensorflow:
- keras
- I used tensorflow yesterday
prostodata|6 years ago