I was intrigued by the author's library "envoy", which is intended to provide a more intuitive interface to running processes from python. (https://github.com/kennethreitz/envoy)
The back story is that the older APIs that Python comes with -- os.popen and os.system -- are deprecated. Programmers are urged to use the "subprocess" module instead. Although this doesn't have the problems of the original functions, it has a rather arcane interface, in particular if you want to read the output (stdout or stderr) of a subprocess.
"envoy" seems to aim at fixing this, by providing sane defaults and being optimized for the common case. However, these defaults have drawbacks of their own.
1. envoy defaults to keeping the process output in memory, as a giant string. This can be a bad choice with regard to memory usage and performance.
2. You can run several processes in a pipe using ("cat foo | grep bla"). But otherwise as far as I can see, run() ignores regular Shell semantics, such as quotes. I imagine this can lead to unexpected results. The amount of data passed from one process to the next is capped at 10 MB -- recipe for bugs that are hard to find.
3. subprocess.call() accepts an array in the style of ["ls", "-l", "/mnt/My SD card"]. This has obvious advantages over having to deal with escaping shell characters. A good API should preserve this advantage over os.system().
4. The defaults cannot be overridden, and no preperations have been made to allow changing them. Of course this can be changed in the future. However, one of the reasons the subprocess.* API is convoluted is that it allows all kinds of flexibility, much of which is needed in many serious programs. It may be difficult to add this flexibility to envoy at a later stage. The point is that a flexible API is hard.
None of this is to discourage this initiative, which seems to me a much-needed improvement over Python's built-in API. Also, with a version number as low as 0.0.2, there is probably little need to worry about API compatibility.
> subprocess.call() accepts an array in the style of ["ls", "-l", "/mnt/My SD card"]. This has obvious advantages over having to deal with escaping shell characters.
Unless you're running on Windows, in which case IME it will corrupt your carefully constructed parameters in completely inappropriate ways that can be debugged only at the cost of (a) changing the call() to execute a script that dumps the actual parameters supplied verbatim, and (b) at least an hour of your life that you're never getting back.
This "feature" is about one step above MS Word's default autoreplace behaviour in irritation level. What happened to "Explicit is better than implicit" and "Special cases aren't special enough to break the rules"?
Great presentation on a library I've loved (and used) for awhile. However according to slide 42 I need to rewrite the regex module? I'm so busy this week though.
The slides don't fit vertically on my screen, so some of the content is cut off. There's no scroll bar so initially it was difficult to figure how to see the info cut off from the bottom. Chrome's text zoom out didn't work either.
I had to highlight the text and drag downwards in order to see the content. But it was annoying having to do this for every slide with a lot of content.
Otherwise, these libraries seem really useful. Thanks for this.
I had the same problem too. I had no idea it was supposed to be a slide show, then once I gave my browser as much screen real estate as the site wanted I had to figure out how to navigate the damn thing.
Whats so wrong with just sticking a bunch of static slides on a page one after another?
While python has always been 'batteries included' I think some of the batteries should not have been included.
Libraries tend to move more quickly than the language and interpreter/compiler. Tying them together, while convenient, often leads to rot, clunky libraries, slow moving updates, and libraries being build to the interpreter/compiler instead of to the needs of the users.
I would like to see instead a somewhat canonical (widely accepted) list of the highest quality libraries for a given set of needs, with information and pro/con/caveats listed for each, instead of them being included in the mainline trunk.
I really applaud what Kenneth Reitz has been doing lately.
The portion that explains of how subprocess shuns dev/ops guys in the beginning is so true. Perl/Bash colleagues at work would basically ask me how to perform output=`command`. Once they seen subprocess, they would continue writing their script in Bash/Perl.
Very true. I spent quite a while trying to learn subprocess, then gave up and just use os.popen() now. It's a shame -- there are certain subprocess features I really would like to have, but it's too hard to remember how to use it.
If backticks are good enough for them, then they don't need the more complex usecases that Popen allows, so just tell them to use check_call or check_output. As far as they should be concerned, the subprocess module has two functions that are straightforward to use.
And those functions are more convenient than the default behavior of backticks, because they handle for you raising an exception if the subprocess fails.
I blame GOF for making Python Standard Libs hard. The patterns described were for an OO system where functions were not first class. Python didn't need to be complicated.
If you have a look at the older libraries, most of them were written in a procedural style. Not only that, it is very amenable to testing in the REPL.
note the absence of doers like "Adapters", "Handler", "Manager", "Factory"
If you have a look at the XML library, roughly when "patterns" became popular, this style of thinking infested standard library contributions. It also coincides with a time when camelCased function names crept into the python standard library.
Here's one in xml/dom/pulldom.py:
self.documentFactory = documentFactory
Once you see this, you know you are in for some subclassing. You can no longer REPL your way to figure out how things work, and you now have to consult the manual.
Here's more pain from libraries of the same era, some of these I'd argue un-Pythonic:
#xml/sax/xmlreader.py:
def setContentHandler(self, handler):
#wsgiref/simple_server.py:
class ServerHandler(SimpleHandler):
#urllib2.py:
class HTTPDigestAuthHandler(BaseHandler,
AbstractDigestAuthHandler):
The last example is especially jarring. Abstract classes have a place in strongly typed world to declare interfaces, and help with vtable-style dispatch. In Python, where you have duck-typing and monkey patching, a class that virtually "does nothing" on its own stands out like a guy in a tux at a beach party.
Even logging is infected by the same over-patterning. logging/__init__.py:
class StreamHandler(Handler)
LoggerAdapter(someLogger, dict(p1=v1, p2="v2"))
"Managers" - what a pain when plain function handles would have done the job. Does this name even tell you what task the class performs?
#multiprocessing/managers.py:
class BaseManager(object)
If anyone remembers, Java had to do OO in a big-style with OO everywhere -- there were no alternatives.
Initially, buttons had to be subclassed just to handle click events, since functions were not first class objects. Then someone came up with a MouseListener interface, which proved too unwieldy to handle a single click. So the MouseEventAdapters came into being.
Therefore, to handle a click in a "pattern" manner involves
an anonymous class
which subclasses MouseAdapter
which implements MouseListener,
which overrides MouseClick.
Publishing how industry solves this problem of "MouseClick" over and over as a pattern [design pattern is a general reusable solution to a commonly occurring problem within a given context in software design] only gives legitimacy to an approach that has dubious wider applicability.
Heavens help the future developers who are forced to do it because it is now recognized as being industrially "good practice" and codified in a reknowned book.
It isn't!
It was a style that was forced by the constraints of a language.
This is neither pythonic nor necessary:
panel.addMouseListener
(
new MouseAdapter ()
{
public void mouseEntered (MouseEvent e) {
System.out.println (e.toString ());
}
}
);
Embracing "foolish, unschooled" thinking, this would be rendered in Python as:
I've been programming python for nearly 10 years, but your comment just helped me clarify a thought that I've had for ages but have never been able to put in to words before: a well designed Python API is one that can be effortless used within the REPL. And that's why urllb2 sucks.
> If anyone remembers, Java had to do OO in a
> big-style with OO everywhere -- there were no
> alternatives.
You can write Java that isn't heavily OO, but you have to implement alternatives to sections of the stdlib that most people assume or take for granted.
Related to what you're saying about the GUI, I'd be interested to see a detailed summary of what the Lighthouse people did, and how it was different to Java. I've found that NeXT tradition stuff - despite claims that it's heavily OO - in facat tends to err away from subclassing towards composition. I suspect the Lighthouse interface patterns did too.
Your example with "MouseClick" succintly explains what I feel is wrong with much of software development today that tries to follow "modern" OO practices. Blame it on the language, or on people who try to mold the world into familiar casts at any cost by overusing patterns?
This presentation brings up a tangential point that has always confused me: how error-prone is starting a subprocess, really?
I agree with the author's goals of making common tasks easier and more obvious. urllib2 is an easy target, as it was added to the standard library over a decade ago, long before REST was something people talked about. The best tools for packaging, versioning, and testing have always been a bit ambiguous in any language, including Python.
However, the author points out something that has always bothered me about Python: it is way harder to start a subprocess with an external command in Python than almost any other language. This has been true whether using sys or os or even subprocess, which is quite recent.
I always felt that this had something to do with the constant warnings in the documentation about how a pipe between the subprocess and the Python process might fill and cause the subprocess to block. Or how running the program through shell rather than exec or something might cause some sort of security issue. Are these real issues that other languages ignore in the name of user convenience, or has Python just never been able to make the right API (as the author seems to argue)?
Creating a subprocess can be complex, at least if you expose all the different subtleties. If you've ever used Java's APIs to run processes, you know that Python's aren't the worst ;-)
There are lots of interesting corner cases, for example how to join stdout and stderr properly without blocking on one stream while the other is overflowing.
On the other hand, almost nobody ever needs this. Ruby's "output = `command`" probably covers 90% of the use cases with the most trivial API imaginable. The hard part obviously is exposing the advanced functionality without compromising on the simplicity.
Almost all programming communities can learn a lot from Ruby's "if it's too hard, you're not cheating enough" approach (dhh quote I believe). Yes, the process could return an exabyte of stdout data, but do you really care? Is that really the problem this API should try to solve, with all special cases? That's not good computer science practice, but surprisingly effective.
There's no fundamental problem that's stopped Python from doing this before. For some reason, all of the ways to spawn a subprocess in Python have tried to map almost directly to the underlying C API... which is pretty awful.
To get around the problem of child subprocesses spewing out too much output and blocking the parent process, one can provide an open file handle to the stdout/stderr arguments of the Popen call. I've ran into this many times and this solution has reliably worked for me every time. This could be documented better in the Python docs.
For quick tasks and scripts, I've found subprocess.check_call, and subprocess.check_output with shell=True are great tools for spawning subprocesses and quickly grabbing output. They're pretty straightforward to use.
I have never been able to figure out how - in Python - to be able to stream asynchronously both stdout and stderr from the subprocess, both printing both of them as well as writing the data to a file.
On installing python -- the most practical way to work with it, is to have a moderately recent os-level python install and then build all the other python versions from source if required -- https://github.com/collective/buildout.python
I wish that the developers of requests module stop changing it's API -- code that was working just fine with 0.6.4 suddenly began finding missing methods in version 0.8.5
What if your package manager doesn't support the version of Python you're targeting? The most common place this happens is if you have some old RHEL 5 boxen that haven't been upgraded (and who that uses RHEL doesn't?). Or suppose you use python 2.7, which isn't supported by very many (any?) "stable" linuces (aka Debian stable, RHEL, Ubuntu LTR, etc).
And installing packages into the system python (if that's what you're suggesting) is the path to madness. It's much better to use virtualenvs you can throw away at will. All in all, it's usually best just to leave the system Python alone to avoid causing problems with any other packages that may depend on it being in a consistent state.
I agree that people should use a good package manager, but the reality of the corporate env world is that you'll be trying to use the latest Python on some ridiculously old unix install without root access, and you'll never be able to get IT to install the latest for all users.
Mostly I agree with you, but installing python on windows is a PITA compared to *nix.
PIP works until you need something that requires compiling extensions which is also a PITA on windows.
IIRC easy_install can install binaries, but I could be wrong about that since I don't do windows development anymore and thus pip works for me 99% of the time.
Regardless, my point is, though I think we all wish it was as simple as using your package manager and pip, it simply isn't that way for everyone.
And it probably never will be, but we can always make it simpler. :)
The Python standard library has gotten worse over time, as it got loaded up with more and more features, obfuscating the common use cases. The irony now is that to do simple, everyday things (like http requests) you are now better off installing a third party package like "requests" than using the standard library. So much for "batteries included."
The standard library needs a reboot. Why not do it in Python 3? Nobody's using it yet anyway ;-)
I can confirm that the python subprocess api is a pain to use and also documented poorly. I recently had to use (no choice) python 2.5.x to write a script that extensively called external programs and ran into several problems. It strange that a language such as python which I find so easy to use in many cases does not already have a good as in simple, safe and well documented subprocess api.
I guess I agree things could be simpler, although the cries of "garbage!" were a bit much. I wrote a wrapper function around urllib2 about 5 years ago and haven't looked back.
Wrappers are handy, but as soon as you need something beyond the basic use case they become useless. What’s great about Requests is that it seems to have minimal leakiness as an abstraction over HTTP.
It's really inexcusable that in 2012 (or 1992 even) a language that otherwise is well-suited for internet programming does not come with a first class httpclient.
[+] [-] loevborg|14 years ago|reply
The back story is that the older APIs that Python comes with -- os.popen and os.system -- are deprecated. Programmers are urged to use the "subprocess" module instead. Although this doesn't have the problems of the original functions, it has a rather arcane interface, in particular if you want to read the output (stdout or stderr) of a subprocess.
"envoy" seems to aim at fixing this, by providing sane defaults and being optimized for the common case. However, these defaults have drawbacks of their own.
1. envoy defaults to keeping the process output in memory, as a giant string. This can be a bad choice with regard to memory usage and performance.
2. You can run several processes in a pipe using ("cat foo | grep bla"). But otherwise as far as I can see, run() ignores regular Shell semantics, such as quotes. I imagine this can lead to unexpected results. The amount of data passed from one process to the next is capped at 10 MB -- recipe for bugs that are hard to find.
3. subprocess.call() accepts an array in the style of ["ls", "-l", "/mnt/My SD card"]. This has obvious advantages over having to deal with escaping shell characters. A good API should preserve this advantage over os.system().
4. The defaults cannot be overridden, and no preperations have been made to allow changing them. Of course this can be changed in the future. However, one of the reasons the subprocess.* API is convoluted is that it allows all kinds of flexibility, much of which is needed in many serious programs. It may be difficult to add this flexibility to envoy at a later stage. The point is that a flexible API is hard.
None of this is to discourage this initiative, which seems to me a much-needed improvement over Python's built-in API. Also, with a version number as low as 0.0.2, there is probably little need to worry about API compatibility.
[+] [-] Silhouette|14 years ago|reply
Unless you're running on Windows, in which case IME it will corrupt your carefully constructed parameters in completely inappropriate ways that can be debugged only at the cost of (a) changing the call() to execute a script that dumps the actual parameters supplied verbatim, and (b) at least an hour of your life that you're never getting back.
This "feature" is about one step above MS Word's default autoreplace behaviour in irritation level. What happened to "Explicit is better than implicit" and "Special cases aren't special enough to break the rules"?
[+] [-] bobbyi|14 years ago|reply
[+] [-] mattdeboard|14 years ago|reply
edit: If anyone wants to see a real-world refactor from HTTPLib/2 to requests, I did so with Pysolr here: https://github.com/mattdeboard/pysolr/commit/db63d8910dec42d...
[+] [-] y3di|14 years ago|reply
I had to highlight the text and drag downwards in order to see the content. But it was annoying having to do this for every slide with a lot of content.
Otherwise, these libraries seem really useful. Thanks for this.
[+] [-] cbs|14 years ago|reply
Whats so wrong with just sticking a bunch of static slides on a page one after another?
[+] [-] minikomi|14 years ago|reply
[+] [-] ovi256|14 years ago|reply
[+] [-] stock_toaster|14 years ago|reply
Libraries tend to move more quickly than the language and interpreter/compiler. Tying them together, while convenient, often leads to rot, clunky libraries, slow moving updates, and libraries being build to the interpreter/compiler instead of to the needs of the users.
I would like to see instead a somewhat canonical (widely accepted) list of the highest quality libraries for a given set of needs, with information and pro/con/caveats listed for each, instead of them being included in the mainline trunk.
I really applaud what Kenneth Reitz has been doing lately.
-- a very happy user of the `requests` library
[+] [-] dextorious|14 years ago|reply
I'd like to see some community effort to build a collection of similar "better than the standard" libs.
Which, at some point, could replace the standard libs. Or be the de facto standard, a pip install call away...
[+] [-] kenneth_reitz|14 years ago|reply
https://github.com/kennethreitz/python-for-humans
[+] [-] joshbaptiste|14 years ago|reply
[+] [-] d0mine|14 years ago|reply
[+] [-] brendano|14 years ago|reply
[+] [-] bobbyi|14 years ago|reply
And those functions are more convenient than the default behavior of backticks, because they handle for you raising an exception if the subprocess fails.
Python:
Perl:[+] [-] pnathan|14 years ago|reply
[+] [-] teyc|14 years ago|reply
If you have a look at the older libraries, most of them were written in a procedural style. Not only that, it is very amenable to testing in the REPL.
note the absence of doers like "Adapters", "Handler", "Manager", "Factory"If you have a look at the XML library, roughly when "patterns" became popular, this style of thinking infested standard library contributions. It also coincides with a time when camelCased function names crept into the python standard library.
Here's one in xml/dom/pulldom.py:
Once you see this, you know you are in for some subclassing. You can no longer REPL your way to figure out how things work, and you now have to consult the manual.Here's more pain from libraries of the same era, some of these I'd argue un-Pythonic:
The last example is especially jarring. Abstract classes have a place in strongly typed world to declare interfaces, and help with vtable-style dispatch. In Python, where you have duck-typing and monkey patching, a class that virtually "does nothing" on its own stands out like a guy in a tux at a beach party.Even logging is infected by the same over-patterning. logging/__init__.py:
"Managers" - what a pain when plain function handles would have done the job. Does this name even tell you what task the class performs? If anyone remembers, Java had to do OO in a big-style with OO everywhere -- there were no alternatives.Initially, buttons had to be subclassed just to handle click events, since functions were not first class objects. Then someone came up with a MouseListener interface, which proved too unwieldy to handle a single click. So the MouseEventAdapters came into being.
Therefore, to handle a click in a "pattern" manner involves
an anonymous class
which subclasses MouseAdapter
which implements MouseListener,
which overrides MouseClick.
Publishing how industry solves this problem of "MouseClick" over and over as a pattern [design pattern is a general reusable solution to a commonly occurring problem within a given context in software design] only gives legitimacy to an approach that has dubious wider applicability.
Heavens help the future developers who are forced to do it because it is now recognized as being industrially "good practice" and codified in a reknowned book.
It isn't!
It was a style that was forced by the constraints of a language.
This is neither pythonic nor necessary:
Embracing "foolish, unschooled" thinking, this would be rendered in Python as: or for multiple event handlers This style of API again allows effective exploration on the REPL.[+] [-] simonw|14 years ago|reply
[+] [-] cturner|14 years ago|reply
Related to what you're saying about the GUI, I'd be interested to see a detailed summary of what the Lighthouse people did, and how it was different to Java. I've found that NeXT tradition stuff - despite claims that it's heavily OO - in facat tends to err away from subclassing towards composition. I suspect the Lighthouse interface patterns did too.
[+] [-] peter_l_downs|14 years ago|reply
[1] https://github.com/peterldowns/lggr
[+] [-] zvrba|14 years ago|reply
[+] [-] dspeyer|14 years ago|reply
[+] [-] bpierre|14 years ago|reply
[+] [-] tripzilch|14 years ago|reply
[+] [-] socratic|14 years ago|reply
I agree with the author's goals of making common tasks easier and more obvious. urllib2 is an easy target, as it was added to the standard library over a decade ago, long before REST was something people talked about. The best tools for packaging, versioning, and testing have always been a bit ambiguous in any language, including Python.
However, the author points out something that has always bothered me about Python: it is way harder to start a subprocess with an external command in Python than almost any other language. This has been true whether using sys or os or even subprocess, which is quite recent.
I always felt that this had something to do with the constant warnings in the documentation about how a pipe between the subprocess and the Python process might fill and cause the subprocess to block. Or how running the program through shell rather than exec or something might cause some sort of security issue. Are these real issues that other languages ignore in the name of user convenience, or has Python just never been able to make the right API (as the author seems to argue)?
[+] [-] Nitramp|14 years ago|reply
There are lots of interesting corner cases, for example how to join stdout and stderr properly without blocking on one stream while the other is overflowing.
On the other hand, almost nobody ever needs this. Ruby's "output = `command`" probably covers 90% of the use cases with the most trivial API imaginable. The hard part obviously is exposing the advanced functionality without compromising on the simplicity.
Almost all programming communities can learn a lot from Ruby's "if it's too hard, you're not cheating enough" approach (dhh quote I believe). Yes, the process could return an exabyte of stdout data, but do you really care? Is that really the problem this API should try to solve, with all special cases? That's not good computer science practice, but surprisingly effective.
[+] [-] bretthoerner|14 years ago|reply
There's no fundamental problem that's stopped Python from doing this before. For some reason, all of the ways to spawn a subprocess in Python have tried to map almost directly to the underlying C API... which is pretty awful.
[+] [-] huyegn|14 years ago|reply
For quick tasks and scripts, I've found subprocess.check_call, and subprocess.check_output with shell=True are great tools for spawning subprocesses and quickly grabbing output. They're pretty straightforward to use.
[+] [-] pnathan|14 years ago|reply
[+] [-] prolepunk|14 years ago|reply
After that use virtualenv with virtualenvwrapper.
[+] [-] drivingmenuts|14 years ago|reply
[+] [-] kenneth_reitz|14 years ago|reply
[+] [-] Heliosmaster|14 years ago|reply
[+] [-] prolepunk|14 years ago|reply
[+] [-] kenneth_reitz|14 years ago|reply
[+] [-] nvictor|14 years ago|reply
[+] [-] rmc|14 years ago|reply
Esp. the "installing python" one. Just use your package manager to install all the versions you need.
And for "Packaging and Dependencies", just use pip.
[+] [-] j_baker|14 years ago|reply
And installing packages into the system python (if that's what you're suggesting) is the path to madness. It's much better to use virtualenvs you can throw away at will. All in all, it's usually best just to leave the system Python alone to avoid causing problems with any other packages that may depend on it being in a consistent state.
[+] [-] engtech|14 years ago|reply
[+] [-] tbatterii|14 years ago|reply
Mostly I agree with you, but installing python on windows is a PITA compared to *nix.
PIP works until you need something that requires compiling extensions which is also a PITA on windows.
IIRC easy_install can install binaries, but I could be wrong about that since I don't do windows development anymore and thus pip works for me 99% of the time.
Regardless, my point is, though I think we all wish it was as simple as using your package manager and pip, it simply isn't that way for everyone.
And it probably never will be, but we can always make it simpler. :)
[+] [-] dhalexander|14 years ago|reply
The standard library needs a reboot. Why not do it in Python 3? Nobody's using it yet anyway ;-)
[+] [-] arjn|14 years ago|reply
[+] [-] mixmastamyk|14 years ago|reply
[+] [-] adeelk|14 years ago|reply
[+] [-] pbreit|14 years ago|reply
[+] [-] pbreit|14 years ago|reply
[+] [-] sktrdie|14 years ago|reply
[+] [-] shuzchen|14 years ago|reply
[+] [-] tuananh|14 years ago|reply