Python for Humans

[+] loevborg|14 years ago|reply

I was intrigued by the author's library "envoy", which is intended to provide a more intuitive interface to running processes from python. (https://github.com/kennethreitz/envoy)

The back story is that the older APIs that Python comes with -- os.popen and os.system -- are deprecated. Programmers are urged to use the "subprocess" module instead. Although this doesn't have the problems of the original functions, it has a rather arcane interface, in particular if you want to read the output (stdout or stderr) of a subprocess.

"envoy" seems to aim at fixing this, by providing sane defaults and being optimized for the common case. However, these defaults have drawbacks of their own.

1. envoy defaults to keeping the process output in memory, as a giant string. This can be a bad choice with regard to memory usage and performance.

2. You can run several processes in a pipe using ("cat foo | grep bla"). But otherwise as far as I can see, run() ignores regular Shell semantics, such as quotes. I imagine this can lead to unexpected results. The amount of data passed from one process to the next is capped at 10 MB -- recipe for bugs that are hard to find.

3. subprocess.call() accepts an array in the style of ["ls", "-l", "/mnt/My SD card"]. This has obvious advantages over having to deal with escaping shell characters. A good API should preserve this advantage over os.system().

4. The defaults cannot be overridden, and no preperations have been made to allow changing them. Of course this can be changed in the future. However, one of the reasons the subprocess.* API is convoluted is that it allows all kinds of flexibility, much of which is needed in many serious programs. It may be difficult to add this flexibility to envoy at a later stage. The point is that a flexible API is hard.

None of this is to discourage this initiative, which seems to me a much-needed improvement over Python's built-in API. Also, with a version number as low as 0.0.2, there is probably little need to worry about API compatibility.

[+] Silhouette|14 years ago|reply

> subprocess.call() accepts an array in the style of ["ls", "-l", "/mnt/My SD card"]. This has obvious advantages over having to deal with escaping shell characters.

Unless you're running on Windows, in which case IME it will corrupt your carefully constructed parameters in completely inappropriate ways that can be debugged only at the cost of (a) changing the call() to execute a script that dumps the actual parameters supplied verbatim, and (b) at least an hour of your life that you're never getting back.

This "feature" is about one step above MS Word's default autoreplace behaviour in irritation level. What happened to "Explicit is better than implicit" and "Special cases aren't special enough to break the rules"?

[+] bobbyi|14 years ago|reply

> it has a rather arcane interface, in particular if you want to read the output (stdout or stderr) of a subprocess.

  oputput = subprocess.check_output(my_command)

[+] mattdeboard|14 years ago|reply

Great presentation on a library I've loved (and used) for awhile. However according to slide 42 I need to rewrite the regex module? I'm so busy this week though.

edit: If anyone wants to see a real-world refactor from HTTPLib/2 to requests, I did so with Pysolr here: https://github.com/mattdeboard/pysolr/commit/db63d8910dec42d...

[+] y3di|14 years ago|reply

The slides don't fit vertically on my screen, so some of the content is cut off. There's no scroll bar so initially it was difficult to figure how to see the info cut off from the bottom. Chrome's text zoom out didn't work either.

I had to highlight the text and drag downwards in order to see the content. But it was annoying having to do this for every slide with a lot of content.

Otherwise, these libraries seem really useful. Thanks for this.

[+] cbs|14 years ago|reply

I had the same problem too. I had no idea it was supposed to be a slide show, then once I gave my browser as much screen real estate as the site wanted I had to figure out how to navigate the damn thing.

Whats so wrong with just sticking a bunch of static slides on a page one after another?

[+] minikomi|14 years ago|reply

Had the same problem but Chrome zoomed out for me.. (Chrome 16 -> [command -] on a mac.)

[+] ovi256|14 years ago|reply

Wow those libraries are indeed great, amazing compared to the standard libs. Hope they'll be included in the standard libs one day.

[+] stock_toaster|14 years ago|reply

While python has always been 'batteries included' I think some of the batteries should not have been included.

Libraries tend to move more quickly than the language and interpreter/compiler. Tying them together, while convenient, often leads to rot, clunky libraries, slow moving updates, and libraries being build to the interpreter/compiler instead of to the needs of the users.

I would like to see instead a somewhat canonical (widely accepted) list of the highest quality libraries for a given set of needs, with information and pro/con/caveats listed for each, instead of them being included in the mainline trunk.

I really applaud what Kenneth Reitz has been doing lately.

-- a very happy user of the `requests` library

[+] dextorious|14 years ago|reply

+1

I'd like to see some community effort to build a collection of similar "better than the standard" libs.

Which, at some point, could replace the standard libs. Or be the de facto standard, a pip install call away...

[+] kenneth_reitz|14 years ago|reply

Audio is available here:

https://github.com/kennethreitz/python-for-humans

[+] joshbaptiste|14 years ago|reply

The portion that explains of how subprocess shuns dev/ops guys in the beginning is so true. Perl/Bash colleagues at work would basically ask me how to perform output=`command`. Once they seen subprocess, they would continue writing their script in Bash/Perl.

[+] d0mine|14 years ago|reply

  from subprocess import check_output as qx

  output = qx(['command', 'arg1'])

[+] brendano|14 years ago|reply

Very true. I spent quite a while trying to learn subprocess, then gave up and just use os.popen() now. It's a shame -- there are certain subprocess features I really would like to have, but it's too hard to remember how to use it.

[+] bobbyi|14 years ago|reply

If backticks are good enough for them, then they don't need the more complex usecases that Popen allows, so just tell them to use check_call or check_output. As far as they should be concerned, the subprocess module has two functions that are straightforward to use.

And those functions are more convenient than the default behavior of backticks, because they handle for you raising an exception if the subprocess fails.

Python:

  import subprocess
  output = subprocess.check_output('command')

Perl:

  $output = `command`
  die "failed: $output" if $?

[+] pnathan|14 years ago|reply

If you commit to building an infrastructure library, you can make a very nice and powerful interface into subprocess that makes life much easier.

[+] teyc|14 years ago|reply

I blame GOF for making Python Standard Libs hard. The patterns described were for an OO system where functions were not first class. Python didn't need to be complicated.

If you have a look at the older libraries, most of them were written in a procedural style. Not only that, it is very amenable to testing in the REPL.

    import smtplib
    s=smtplib.SMTP("localhost")
    s.sendmail("[email protected]",tolist,msg)

note the absence of doers like "Adapters", "Handler", "Manager", "Factory"

If you have a look at the XML library, roughly when "patterns" became popular, this style of thinking infested standard library contributions. It also coincides with a time when camelCased function names crept into the python standard library.

Here's one in xml/dom/pulldom.py:

    self.documentFactory = documentFactory

Once you see this, you know you are in for some subclassing. You can no longer REPL your way to figure out how things work, and you now have to consult the manual.

Here's more pain from libraries of the same era, some of these I'd argue un-Pythonic:

    #xml/sax/xmlreader.py:    
    def setContentHandler(self, handler):

    #wsgiref/simple_server.py:
    class ServerHandler(SimpleHandler):

    #urllib2.py:
    class HTTPDigestAuthHandler(BaseHandler,
       AbstractDigestAuthHandler):

The last example is especially jarring. Abstract classes have a place in strongly typed world to declare interfaces, and help with vtable-style dispatch. In Python, where you have duck-typing and monkey patching, a class that virtually "does nothing" on its own stands out like a guy in a tux at a beach party.

Even logging is infected by the same over-patterning. logging/__init__.py:

    class StreamHandler(Handler)
    LoggerAdapter(someLogger, dict(p1=v1, p2="v2"))

"Managers" - what a pain when plain function handles would have done the job. Does this name even tell you what task the class performs?

    #multiprocessing/managers.py:
    class BaseManager(object)

If anyone remembers, Java had to do OO in a big-style with OO everywhere -- there were no alternatives.

Initially, buttons had to be subclassed just to handle click events, since functions were not first class objects. Then someone came up with a MouseListener interface, which proved too unwieldy to handle a single click. So the MouseEventAdapters came into being.

Therefore, to handle a click in a "pattern" manner involves

an anonymous class

which subclasses MouseAdapter

which implements MouseListener,

which overrides MouseClick.

Publishing how industry solves this problem of "MouseClick" over and over as a pattern [design pattern is a general reusable solution to a commonly occurring problem within a given context in software design] only gives legitimacy to an approach that has dubious wider applicability.

Heavens help the future developers who are forced to do it because it is now recognized as being industrially "good practice" and codified in a reknowned book.

It isn't!

It was a style that was forced by the constraints of a language.

This is neither pythonic nor necessary:

    panel.addMouseListener
    (
      new MouseAdapter ()
      {
        public void mouseEntered (MouseEvent e) {
          System.out.println (e.toString ());
        }
      }
    );

Embracing "foolish, unschooled" thinking, this would be rendered in Python as:

   def mouseEntered(event):
     print event
   panel.mouseEntered = mouseEntered

or for multiple event handlers

   panel.mouseEntered.append(mouseEntered)

This style of API again allows effective exploration on the REPL.

[+] simonw|14 years ago|reply

I've been programming python for nearly 10 years, but your comment just helped me clarify a thought that I've had for ages but have never been able to put in to words before: a well designed Python API is one that can be effortless used within the REPL. And that's why urllb2 sucks.

[+] cturner|14 years ago|reply

    > If anyone remembers, Java had to do OO in a
    > big-style with OO everywhere -- there were no
    > alternatives.

You can write Java that isn't heavily OO, but you have to implement alternatives to sections of the stdlib that most people assume or take for granted.

Related to what you're saying about the GUI, I'd be interested to see a detailed summary of what the Lighthouse people did, and how it was different to Java. I've found that NeXT tradition stuff - despite claims that it's heavily OO - in facat tends to err away from subclassing towards composition. I suspect the Lighthouse interface patterns did too.

[+] peter_l_downs|14 years ago|reply

I'm making an attempt [1] at simple logging. Thoughts on the basic design so far?

[1] https://github.com/peterldowns/lggr

[+] zvrba|14 years ago|reply

Your example with "MouseClick" succintly explains what I feel is wrong with much of software development today that tries to follow "modern" OO practices. Blame it on the language, or on people who try to mold the world into familiar casts at any cost by overusing patterns?

[+] dspeyer|14 years ago|reply

Anybody have a text version of this? I got maybe 30 slides in before I got too annoyed to continue.

[+] bpierre|14 years ago|reply

You can read the Markdown file: https://github.com/kennethreitz/python-for-humans/blob/maste...

[+] tripzilch|14 years ago|reply

Turn off your stylesheets and scroll down (the TOC links don't seem to work anymore when you do that).

[+] socratic|14 years ago|reply

This presentation brings up a tangential point that has always confused me: how error-prone is starting a subprocess, really?

I agree with the author's goals of making common tasks easier and more obvious. urllib2 is an easy target, as it was added to the standard library over a decade ago, long before REST was something people talked about. The best tools for packaging, versioning, and testing have always been a bit ambiguous in any language, including Python.

However, the author points out something that has always bothered me about Python: it is way harder to start a subprocess with an external command in Python than almost any other language. This has been true whether using sys or os or even subprocess, which is quite recent.

I always felt that this had something to do with the constant warnings in the documentation about how a pipe between the subprocess and the Python process might fill and cause the subprocess to block. Or how running the program through shell rather than exec or something might cause some sort of security issue. Are these real issues that other languages ignore in the name of user convenience, or has Python just never been able to make the right API (as the author seems to argue)?

[+] Nitramp|14 years ago|reply

Creating a subprocess can be complex, at least if you expose all the different subtleties. If you've ever used Java's APIs to run processes, you know that Python's aren't the worst ;-)

There are lots of interesting corner cases, for example how to join stdout and stderr properly without blocking on one stream while the other is overflowing.

On the other hand, almost nobody ever needs this. Ruby's "output = `command`" probably covers 90% of the use cases with the most trivial API imaginable. The hard part obviously is exposing the advanced functionality without compromising on the simplicity.

Almost all programming communities can learn a lot from Ruby's "if it's too hard, you're not cheating enough" approach (dhh quote I believe). Yes, the process could return an exabyte of stdout data, but do you really care? Is that really the problem this API should try to solve, with all special cases? That's not good computer science practice, but surprisingly effective.

[+] bretthoerner|14 years ago|reply

It's funny you mention that. The author/speaker wrote a "Subprocesses for humans" module, too: https://github.com/kennethreitz/envoy

There's no fundamental problem that's stopped Python from doing this before. For some reason, all of the ways to spawn a subprocess in Python have tried to map almost directly to the underlying C API... which is pretty awful.

[+] huyegn|14 years ago|reply

To get around the problem of child subprocesses spewing out too much output and blocking the parent process, one can provide an open file handle to the stdout/stderr arguments of the Popen call. I've ran into this many times and this solution has reliably worked for me every time. This could be documented better in the Python docs.

For quick tasks and scripts, I've found subprocess.check_call, and subprocess.check_output with shell=True are great tools for spawning subprocesses and quickly grabbing output. They're pretty straightforward to use.

[+] pnathan|14 years ago|reply

I have never been able to figure out how - in Python - to be able to stream asynchronously both stdout and stderr from the subprocess, both printing both of them as well as writing the data to a file.

[+] prolepunk|14 years ago|reply

On installing python -- the most practical way to work with it, is to have a moderately recent os-level python install and then build all the other python versions from source if required -- https://github.com/collective/buildout.python

After that use virtualenv with virtualenvwrapper.

[+] drivingmenuts|14 years ago|reply

Hangs on the first slide unless I'm doing something really wrong.

[+] kenneth_reitz|14 years ago|reply

Try the arrow keys

[+] Heliosmaster|14 years ago|reply

Am I weird if I say that such badly-written presentation make me much less enthusiastic about their content?

[+] prolepunk|14 years ago|reply

I wish that the developers of requests module stop changing it's API -- code that was working just fine with 0.6.4 suddenly began finding missing methods in version 0.8.5

[+] kenneth_reitz|14 years ago|reply

All changes are for the best, I promise. API will stable by 1.0.

[+] nvictor|14 years ago|reply

seems the guy nailed down many issues i have had in the past :)

[+] rmc|14 years ago|reply

This makes a lot of good points, but some bad ones.

Esp. the "installing python" one. Just use your package manager to install all the versions you need.

And for "Packaging and Dependencies", just use pip.

[+] j_baker|14 years ago|reply

What if your package manager doesn't support the version of Python you're targeting? The most common place this happens is if you have some old RHEL 5 boxen that haven't been upgraded (and who that uses RHEL doesn't?). Or suppose you use python 2.7, which isn't supported by very many (any?) "stable" linuces (aka Debian stable, RHEL, Ubuntu LTR, etc).

And installing packages into the system python (if that's what you're suggesting) is the path to madness. It's much better to use virtualenvs you can throw away at will. All in all, it's usually best just to leave the system Python alone to avoid causing problems with any other packages that may depend on it being in a consistent state.

[+] engtech|14 years ago|reply

I agree that people should use a good package manager, but the reality of the corporate env world is that you'll be trying to use the latest Python on some ridiculously old unix install without root access, and you'll never be able to get IT to install the latest for all users.

[+] tbatterii|14 years ago|reply

What's the package manager for windows?

Mostly I agree with you, but installing python on windows is a PITA compared to *nix.

PIP works until you need something that requires compiling extensions which is also a PITA on windows.

IIRC easy_install can install binaries, but I could be wrong about that since I don't do windows development anymore and thus pip works for me 99% of the time.

Regardless, my point is, though I think we all wish it was as simple as using your package manager and pip, it simply isn't that way for everyone.

And it probably never will be, but we can always make it simpler. :)

[+] dhalexander|14 years ago|reply

The Python standard library has gotten worse over time, as it got loaded up with more and more features, obfuscating the common use cases. The irony now is that to do simple, everyday things (like http requests) you are now better off installing a third party package like "requests" than using the standard library. So much for "batteries included."

The standard library needs a reboot. Why not do it in Python 3? Nobody's using it yet anyway ;-)

[+] arjn|14 years ago|reply

I can confirm that the python subprocess api is a pain to use and also documented poorly. I recently had to use (no choice) python 2.5.x to write a script that extensively called external programs and ran into several problems. It strange that a language such as python which I find so easy to use in many cases does not already have a good as in simple, safe and well documented subprocess api.

[+] mixmastamyk|14 years ago|reply

I guess I agree things could be simpler, although the cries of "garbage!" were a bit much. I wrote a wrapper function around urllib2 about 5 years ago and haven't looked back.

[+] adeelk|14 years ago|reply

Wrappers are handy, but as soon as you need something beyond the basic use case they become useless. What’s great about Requests is that it seems to have minimal leakiness as an abstraction over HTTP.

[+] pbreit|14 years ago|reply

That's precisely the problem: your wrapper helps you...and only you (ie, it's worthless).

[+] pbreit|14 years ago|reply

It's really inexcusable that in 2012 (or 1992 even) a language that otherwise is well-suited for internet programming does not come with a first class httpclient.

[+] sktrdie|14 years ago|reply

What did they use to make the presentation?

[+] shuzchen|14 years ago|reply

Looks like markdown and a sinatra based presentation framework. See the code at: https://github.com/kennethreitz/python-for-humans

[+] tuananh|14 years ago|reply

The name says it all. This is much more readable compared to urllib2

106 comments