Why I hate virtualenv and pip

[+] ubernostrum|12 years ago|reply

A lot of this seems to boil down to a combination of seeing others misuse tools and assuming that's what they're for (which is a communication/teaching failure, not a failure of the tool), and looking at stepping-stone solutions.

Take binary packages, for example. Sure, eggs did that. Sort of. But they also introduced a weird parallel universe where you had to stop doing normal Python things and start doing egg things. So pip eschewed eggs.

And meanwhile the community banded together to find a way to separate the build and install processes, the result of which is the wheel format:

http://www.python.org/dev/peps/pep-0427/

Similarly, virtual environments are still being improved (with the improvements now being integrated direct into Python itself).

And yes, you can use pip's requirements files as a duplicate way to specify dependencies. And people do that, and it's unfortunate. Because the thing requirements files are really useful for is specifying a particular environment that you want to replicate. That might be a known-good set of stuff you've tested and now want to deploy on, it might be an experimental combination of things you want others to test, etc., but that's something setup.py's dependency system isn't good at. And repeatable/replicable environments are certainly an important thing.

etc., etc.

[+] venantius|12 years ago|reply

Yeah, I think that was largely my reaction as well. Attacking virtualenv in particular because you've seen others use it as a replacement for a completely isolated environment just says that at the very least they didn't understand its intended use case, and perhaps OP doesn't either.

Also, while I support his offering a full-isolation alternative, realistically not everybody is going to want to develop in a Vagrant environment. It's a great solution if you're willing to run with that sort of overhead, but not everybody is.

[+] shoo|12 years ago|reply

i was pleased to see wheel mentioned. here is a wheel anecdote:

suppose you wanted to automate the installation of a python package to a windows machine, but the package has binary dependencies, is irritating to build from source, and is distributed as a pre-built .exe interactive installer (click next, next, ...). you can `wheel convert` the exe installer to get a binary wheel archive, then automate the installation of that wheel archive with pip. hopefully this isn't a common scenario, but the fact that pip + wheel make this kind of thing possible at all is very helpful.

[+] skrebbel|12 years ago|reply

As a programmer who uses a lot of Java and .NET, I was surprised by how complex pip and virtualenv are. The complexity is hidden well enough if you just want to get down to business, so it's not really a practical concern, but still - copy an entire Python environment over? Why? Why can't I just have the pip'ed libraries in a subdir somewhere and tell Python to look there?

Plain old JARs get this right, Maven gets this right, NuGet gets this right, NPM gets this right. Why is it so complex on Python and Ruby? Some technological aspect of the (flexibility of) the languages that need you to basically copy over the entire world? Or just legacy of unfortunate bad design choices in earlier days?

[+] calpaterson|12 years ago|reply

Well, I would not want to copy a virtualenv from one machine onto another - that approach sounds like it's fraught with trouble.

I think the sane way to do it is to package your app using setuptools (and list dependencies) and then use pip install to install it in a new virtualenv on the production machines.

Here's how I do it on the project I'm currently on:

https://github.com/InsolvencyService/rps-alpha/blob/master/s...

[+] regularfry|12 years ago|reply

Ruby gets it right, too. You can have a very simple gem setup if you want it - set GEM_HOME and away you go. I'm sure it's possible to get the same simplicity with Python.

The issue I see is that the default tools people turn to hide this simplicity behind a shiny interface, and encourage people to think of the management tools as magic black boxes. The tools then become the only interface people know, and they're inevitably more complicated and fragile than the underlying mechanism.

[+] lmm|12 years ago|reply

Python predates Java and .NET (never mind NPM), and was originally more systems-oriented; the idea is that you install libraries globally, using your system package manager, just like with C libraries. pip/virtualenv had to be retrofitted on afterwards.

[+] lifeisstillgood|12 years ago|reply

because (and correct me if I am wrong) Jars (and certainly Wars) contain other bits than the part you wrote.

let's say you are packaging up a java and a python "program". Both print Hello world to the stdout, but use a third party spell checking package. Both the venv and the jar will contain that third party package

All python needs to go from venv to jar is a tarball process.

that is roughly were I see us going anyway - ps anyone with good understanding of jar files please jump in - I would love to map out the python parts against the jar parts (ie pip install is what part of the java setup process?)

[+] Anderkent|12 years ago|reply

>Why? Why can't I just have the pip'ed libraries in a subdir somewhere and tell Python to look there?

Because that ties you to a single system wide python. Why would you want that?

[+] est|12 years ago|reply

> NPM gets this right

https://news.ycombinator.com/item?id=6859810

[+] thwarted|12 years ago|reply

python actually has reasonably good support for this via a single environment variable (PYTHONENV or PYTHONHOME, I think, it's been a while since I looked at this). The problem is that it overrides all the default search/path locations, and thus is missing the complete stdlib unless you copy everything into it. There's also the matter of binary modules that can not be shared between some interpreter versions (which feels like it begat Ubuntu's questionable python-dist symlink nightmare).

[+] darkarmani|12 years ago|reply

> NPM gets this right.

Until you want to actually deploy to production -- then good luck, write your own tools.

[+] coldtea|12 years ago|reply

>Provided the developer is fully conscious that this is python-level-only isolation, then it is useful.

Of course we know it's python level only isolation. We're still running in an OS with a filesystem and such. If we wanted something more we'd use jails or something similar.

>Full methods of isolation make virtualenv redundant

So what? They are too heavy handed, and 99% of the time, I don't want them anyway.

>It is very, very easy to install something as large as a Django application into a prefix. Easier, I would argue, then indirectly driving virtualenv and messing with python shebangs.

You'd argue, but you'd lose the argument.

>You need to preserve this behaviour right down the line if you want to run things in this virtualenv from the outside, like a cron job. You will need to effectively hardcode the path of the virtualenv to run the correct python. This is at least as fiddly as manually setting up your PATH/PYTHONPATH.

Yes, if only they gave you OTHER BENEFITS in exchange. Oh, wait.

In general, move, nothing to see here...

[+] dissent|12 years ago|reply

> You'd argue, but you'd lose the argument.

Done it already, in production. The virtualenv fanboys didn't even notice. It's simple and elegant and works perfectly.

[+] kevcampb|12 years ago|reply

Why I love virtualenv and pip.

We use virtualenv and pip extensively here, with virtualenvwrapper.

  mkvirtualenv <project>
  workon <project>
  pip install -r requirements.txt

It just works. I don't spend any time on it. Our developers don't have any problems with it. All the other considerations in the article we either handle as you're meant to, or understand the limitations.

Still, looking forward to some interesting comments on here.

[+] jvdh|12 years ago|reply

Did you actually read the entire article, or did you just come here to say that?

[+] yukkurishite|12 years ago|reply

Thanks for your pointless blog post

[+] lmm|12 years ago|reply

Sure, pip's imperfect; I have to install the mysql header files, woe is me. But the cost/benefit tradeoff is better than LXC; pip gets me most of the isolation with much, much less overhead.

Is the author really claiming that it's easier to script a non-virtualenv deployment than a virtualenv one? If so, great, do that - the only reason I deploy with virtualenv is because, guess what, that's easier to script.

Why default to --no-site-packages? Because it helps and it's easy. No, I'm not perfectly isolated from my host system - but then the host system could have a broken libc and then nothing, not even LXC, is going to make your isolation system work. Just because you can't isolate perfectly doesn't mean there's no point isolating as much as you can.

Yes, pip builds from source. That's a lot more reliable than the alternative. The Java guys certainly aren't mocking you if they've ever done the actually equivalent thing, i.e. deploy a library that uses JNI, which is a complete clusterfuck.

(URLs as dependencies are indeed a bad idea; don't do that. The complaint about pip freeze is purely derivative of the other complaints; it's wrong because they're wrong).

[+] dissent|12 years ago|reply

Yeah, the lxc package gives you command line tools that are a close analogue to virtualenv. Combine it with a filesystem like btrfs for copy-on-write, and it's FAR quicker too.

I'm glad you mentioned JNI. In Java, native is the exception. In python, it's much closer to the rule. A hell of a lot of python libraries rely on C components which leak out of a virtualenv.

Building from source isn't reliable. It's quite hard, not to mention relatively slow. See the great success of RPM and APT based Linux distributions as proof of this.

[+] secstate|12 years ago|reply

Why pip?

pip uninstall psycopg2

or

pip install --upgrade psycopg2

But I guess with easy_install you can fake it by running with -m and then deleting the errant egg files in lib and bin files. That's pretty easy, I guess.

Oh but hey, you know what you can do instead? Setup a virtualenv, easy_install everything and when it gets hopelessly out of date or munged, you can just delete the virtualenv directory and start again.

Snark aside, I would agree with the OP that the "feature" of installing via arbitrary URLs is an anti-pattern and encourages lazy development. Of course, not every package we build can be posted to a public package library, so there's always that issue with easy_install too. Sigh, what a mess we have. Good thing I'm still able to get work done with these tools :)

[+] mattdeboard|12 years ago|reply

Good luck pip installing psycopg2 on Windows :)

[+] lifeisstillgood|12 years ago|reply

I think we should look back to the rant of Python Core Committer Hynek Schlawack. (https://hynek.me/articles/python-app-deployment-with-native-...)

In short build your system and it's dependencies once and once only then pass them around through test into live.

We have three competing needs: a reliable deployment process that can move a binary-like blob across multiple test environments

A need for exactly reproducible environments but without dd/ghosting

A desire to keep things simple

Isolation is good - whether through the mostly isolated approach of venv, the almost total isolation of jails/LXC or the vagrant approach. But they focus almost entirely on binary builds - how does one pass around a python environment without rebuilding it and it's dependencies each time ala pip?

Well by taking the running built python environments and passing them into a package manager like apt and calling that a binary. That might mean tar balling a venv or tar balling /use/local/python but in the end it matters that we pass around the same basic bits.

I am working this out in pyholodeck.mikadosoftware.com and in my head - when I have a good answer I will shout

[+] viraptor|12 years ago|reply

While I see where he's coming from, I really can't agree with many things he's saying:

"For python packages that depend on system libraries, only the python-level part of those packages are isolated."

And there's nothing really bad about it. Well-written python libraries will work with any previous version of the library they're wrapping. They will also report incompatibilities. It's ok to use system libraries - especially if you're advocating getting rid of virtualenv as author does.

"Full methods of isolation make virtualenv redundant"

Well... no. There are times when installing a local version of some library is required and it cannot be installed system-wide, or it will break system's yum for example. You're not only isolating your app from the system, but also the system tools from the app.

"virtualenv’s value lies only in conveniently allowing a user to _interactively_ create a python sandbox"

There's nothing interactive about what `tox` does for example and it's a perfect example of why virtualenv is useful. You can have not only a virtualenv for testing your app, but also multiple configurations for different selected extras - all living side by side.

"Clearly virtualenv advocates don’t want any hidden dependencies or incorrect versions leaking into their environment. However their virtualenv will always be on the path first, so there’s little real danger"

Until you want the same package that's available in the system, but your app's version constraint is not looked at when the system's package is upgraded. Or you want different extras selected. Or your deps are incompatible with some-system-application deps, but you're calling it via subprocess (this is also where changing the python path in shbang comes useful).

Venvs are definitely not perfect, but for testing and installation of apps, they're amazingly useful. Binary libs issue is definitely annoying, but there's a different solution for it and I'm happy to see it used more often - don't compile extensions, but use cffi/ctypes.

[+] praptak|12 years ago|reply

Companion read, "Python Packaging: Hate, hate, hate everywhere" by Armin Ronacher (June 2012): http://lucumr.pocoo.org/2012/6/22/hate-hate-hate-everywhere/

[+] kyzyl|12 years ago|reply

Has anybody here used conda[1] for anything significant/serious yet? I've been using it in passing for my side and small projects but I'm still not convinced I want to go whole hog yet.

Regardless, my experience with it so far has been... ideal. It really makes building environments and linking/unlinking packages a breeze. I haven't needed it for building my own packages yet, so we'll see how that goes.

[1] http://docs.continuum.io/conda/

[+] hogu|12 years ago|reply

If anyone needs more color, Travis, the CEO of continuum and the author of NumPy just wrote a great post explaining why conda exists, and why virtualenv and pip aren't sufficient for the scientific python community.

http://technicaldiscovery.blogspot.com/2013/12/why-i-promote...

[+] INTPenis|12 years ago|reply

To both commenters so far I would like to say PEP-370, a new way since 2.7 to create virtual environments.

I started using it recently and I see no need for virtualenv anymore.

I have nothing to say about the pip issue though, never had an issue with pip myself.

[+] huxley|12 years ago|reply

Back in 2009, Jesse Noller wrote about his experiences with PEP-370 and seems to have felt that it is an improvement on the status-quo and a complement to virtualenv but not a replacement:

http://jessenoller.com/blog/2009/07/19/pep-370-per-user-site...

[+] erikb|12 years ago|reply

Please add a link to the PEP in your post. Thanks.

[+] puller|12 years ago|reply

What has the author built to replace pip?

[+] midas007|12 years ago|reply

Amen.

Generic algorithm of making things better:

0. Give it a go to fix it oneself first. Really.

1. Failing the previous, raise the perceived deficiency with a specific and workable proposed solution.

2. Failing the previous, indicate what's undesirable and how, and what behavior would be desirable.

3. Failing the previous, put a monetary bounty on the feature, fork the project or live with it. Rewriting from scratch has a 99.99% probability of being several times more work than it seems.

[+] knappador|12 years ago|reply

The lightweight part is pretty useful. LXC is definitely overkill. I don't want to have to bridge my graphics and networking over so that I can run programs against different versions of libraries. Going more lightweight, if I'm doing PYTHONPATH and PYTHONHOME, I would start scripting them, script the installation of libraries into my virtual environment that I just recreated badly...

--no-site-packages has been default for a while. http://www.virtualenv.org/en/latest/virtualenv.html#the-syst...

I don't really see the argument about compiling against system headers and libs. Generally I do want to isolate my Python modules that are calling into other binary libs but don't care about isolating those binary libs themselves because their interface isn't changing when the Python wrapper for them changes. This is unless they are part of what I'm wanting to develop/deploy with, in which case the source will be in the virtualenv and install into the virtualenv using the config script at worst. A frequent example ends up being how Pygame will compile against some system libvideo.so, the behavior of which I never change, but may Pygames might have their own API's etc, and so the many compiled versions do have their own use.

Virtualenv is actually pretty noob friendly because one of the mistakes I see far more frequently than the others is that users will install things using pip system-wide that conflict with the system package manager. This can become pretty difficult to unscrew for inexperienced Linux users.

I've been meaning to actually add some virtualenv docs because of the frequency with which inexperienced Python and Linux users in general will waltz in and not be able to compile something because only the old version of Cython etc are on Ubuntu 11.blah and thus we start bringing in distribution-specific package managers into the realm Python package management was intended for and people try to figure out what version of Ubuntu they need instead of figuring out that they can install everything in one place in many instances and maintain an entire slew of projects without conflicts and without calling on IRC when synaptics clobbers things.

[+] dissent|12 years ago|reply

There's a level between virtualenv and LXC, and that's schroot. Combine it with a CoW filesystem and that will cover everything you mention. Although personally, I find LXC very lightweight. Note that in my article, I did point out that I do still use virtualenv sometimes :)

[+] jhull|12 years ago|reply

Interesting take on this. I think everyone just accepts pip + virtualenv as the only way to be without questioning it. You have definitely convinced me to reexamine that on my next project.

[+] d0m|12 years ago|reply

Well, I really don't agree with any of the arguments in the article. I'll use the experience card and just say that I've been using virtualenv/pip for years and it always served me very well. Made development, testing, and deployment easier. Even if it's hackish and that there are more robust solutions, this one strikes a perfect good enough of quality versus time versus complexity.

[+] aidos|12 years ago|reply

Here's an idea, if you want to convince me there's a better way of doing something, don't belittle me for the way I'm currently doing it.

I know pip isn't perfect. I know venv isn't perfect. They do work pretty well though. And when you find something that works well for you in your process, use it.

Some valid points (many of which have been on articles featured on HN before). Shame about the tone.

[+] rdtsc|12 years ago|reply

Please take a look at using and building real packages for your system. RPM and APT. These are battle tested, they handle dependencies, transitive dependencies, multiple repo sources, {pre/post}{install/uninstall/upgrade} scrips. They provide a transactional way to add software to your systems.

You can use pip and virtualenv as well perhaps by creating a parallel Python install in /opt or something like that if needed. And then install that in an RPM if needed.

But if you are installing hundreds of binary files, dlls and using requirements.txt as the main way to specify dependencies you are probably going to end up with a mess.

It is much harder if you have multiple OS systems to support. Installing on Windows, RHEL/CentOS, Ubuntu and Mac OS X is hard in an easy and clean way. But if you target a specific server platform like say CentOS 6, take a look at RPMs.

[+] erikb|12 years ago|reply

This is pretty much what I often discuss with a friend of mine. Like you he always tells me to use RPM/APT instead of the Python stuff. But my goal is not that the stuff runs as clean as possible on one specific system, but that it runs on as many OS's as possible in relatively clean manner. If virtualenv doesn't tell my Ubuntu's APT that it installed some Python packages in ~/.virtualenv/project_one then an Ubuntu sysadmin will cry and try to kill me, I really don't care. On the other hand with virtualenv I can make sure, that my team leader can run the same command lines as myself and get my Python project running on his Suse box and the QA department can run my code on their Fedora as well. This is why I use Python and this is why I use virtualenv. I'm not happy with how things are in Python, but if I compare the problems between both solution paths currently I still think I'm better off with virtualenv and co.

[+] dissent|12 years ago|reply

This happens to be just where some of my issues come from. When building a monolithic python app into an RPM, Virtualenv was somewhere between annoying and pointless. It's this belief that it actually does something other than set paths that annoys me about it. Just listen to the rants on here. If it had been named Python Pathsetter, I don't think it would have gotten a following!

[+] natrius|12 years ago|reply

Isolating an entire environment is a better idea than isolating a python environment, but isolating a language's environment is an easier problem to solve, so the tools for it are currently better. I doubt we'll all be doing it this way in 2020, but it works pretty well right now.

[+] ivansavz|12 years ago|reply

My friend who is a python guru uses buildout for all his apps.

Some of the recipes I have seen go more into configuration-management-like stuff but it is cool to see a single buildout script deploy nginx, DB, deps, and app in one go, on any linux box.

[+] sillysaurus2|12 years ago|reply

it is cool to see a single buildout script deploy nginx, DB, deps, and app in one go, on any linux box.

Would anyone please link to a practical, working example of this? I want to use buildouts, but I learn from example, and there seem to be very few examples of how to deploy a production configuration. How do the pros do it? What are the gotchas? Is there a book I can buy? Will someone please put together a PDF explaining all this, so that I can throw money at you?

EDIT: Arg, that's exactly what I mean... https://github.com/elbart/nginx-buildout is an ok example just to learn the basics of buildout, but making it "production ready" (i.e. extending it to build postgres, etc) is left as an exercise for the reader. I was really hoping to find a production buildout example... (But thank you, rithi! I appreciate you took the time to dig that one up for me.)

[+] pushedx|12 years ago|reply

I agree that developers using virtualenv should be aware that it does not provude isolation for system dependancies. On two different projects, I've had to track down a PIL bug on another developer's machine only to find that the wrong version of libjpg or libpng was installed. Solution? Install the right version. I've never experienced needing different versions of a system library, but if I did, LXC with btrfs sounds like an option worth trying to avoid the overhead of vagrant.

113 comments