top | item 12451318

A bite of Python

334 points| ashitlerferad | 9 years ago |access.redhat.com

162 comments

order
[+] michaelfeathers|9 years ago|reply
The behavior of 'assert' is not an anomaly. It comes from 'design by contract.' Assert is primarily meant to be documentation of constraints in code and secondarily a way of catching errors during development.

"Contract conditions should never be violated during execution of a bug-free program. Contracts are therefore typically only checked in debug mode during software development. Later at release, the contract checks are disabled to maximize performance." - https://en.wikipedia.org/wiki/Design_by_contract

[+] humanrebar|9 years ago|reply
That is certainly one approach, and the article agrees.

> The root cause of this weakness is that the assert mechanism is designed purely for testing purposes, as is done in C++.

However, C and C++ are perhaps unique in how much undefined behavior is possible and in how simple it is to create. Inserting into a vector while iterating through it, for instance. Or an uninitialized pointer.

That's why many C++ experts believe in runtime assertions in production. Crashing the application with a core dump is generally preferable to trashing memory, corrupting your database, or launching the missiles.

[+] blowski|9 years ago|reply
It's not an anomaly, but it can be a surprise for people who don't understand what it does. For example, they use asserts for validation and then the validation doesn't work in production. It's absolutely right the way it works, but it's still a gotcha for the audience this blog post is aimed at.
[+] tolmasky|9 years ago|reply
Couldn't the assert message just say something like ", warning: this is only checked in development". I don't know, requirement of knowing how something works are always kind of tough since a lot of people's first interactions with things are in the code (like if they've just joined a new project), and they may assume they understand the functionality, and their assumptions may initially seem correct as they test it themselves. Its one of those "don't know what you don't know" scenarios, and "look up every function you ever see just in case what you think it does isn't what it actually does!" can be a bit impractical. So if this is known to be a gotcha, making the function itself speak that gotcha might be useful.
[+] catnaroek|9 years ago|reply
> Assert is primarily meant to be documentation of constraints in code

Real or imagined constraints? AFAICT, an assert only tells me what you wish your program did, but that has absolutely no bearing on what it will actually do.

[+] pvdebbe|9 years ago|reply
One Python gotcha that has bitten people in my company a lot:

    fun_call('string1',
             'string2'
             'string3')
That is, missing commas and subsequent string concatenations can lead to nasty errors. I wish Python didn't nick this from C and would have just enforced the use of + to concat over-length strings, if they need to be split to multiple lines.
[+] xapata|9 years ago|reply
They considered dropping that for Python 3. I forget the reason why they changed their minds, but there's probably a PEP about it. You may find that their discussion will change your mind as well.
[+] dec0dedab0de|9 years ago|reply
I love Python, but I have to agree on this one.
[+] viraptor|9 years ago|reply
If you're interested in reviewing Python code for potential security issues, here's a related project: https://github.com/openstack/bandit (I'm one of the devs)

It will actually pick up a number of security issues listed in the post. It's useful in real world too - led to a number of CVEs being reported.

[+] ubernostrum|9 years ago|reply
I looked at Bandit earlier this year, but had to put it down when I discovered it didn't have a way to fill in default config -- the instant I specified anything in the config file, I had to supply a complete config, including literally every single check it's capable of doing, because Bandit would discard all its defaults on encountering that single line of custom config.

Don't suppose you know if that's gotten better?

[+] alex-yo|9 years ago|reply
I wouldn't call it 'traps'. I would call it 'read and understand documentation before writing code' like: what is 'is' operator, or how floats behave in EVERY programming language, or why you should sanitize EVERY user input.

So, basically, I can write such a list for every language I know.

[+] blowski|9 years ago|reply
Relying on developers to read and remember every bit of documentation for every bit of code is more likely to end up with insecure code compared to introducing sane defaults with an explicit, expressive API.
[+] baq|9 years ago|reply
which is great and you totally should. not everyone knows about those things.
[+] hacker42|9 years ago|reply
That's like saying people shouldn't read FAQs because they should rather read the documentation. These things aren't actually mutually exclusive.
[+] skywhopper|9 years ago|reply
I would never accuse Python of "language clarity and friendliness". Far from it. For someone who came up through C, Java, Perl, and Ruby, but who's wrangled with Python, Javascript, Go, and even Haskell in recent years, I still find Python mysterious, self-contradictory, and filled with implicit rules and assumptions that are entirely un-intuitive to me far more than other languages. And yet, people seem to like it. Certainly this article does. It's an interesting effect.
[+] kstrauser|9 years ago|reply
I find that with Python that's almost always caused by not quite understanding the underlying rules. Once understood, they're very consistent.

For example: does Python pass function arguments by value or by reference? Neither! It passes them by object reference - not by variable reference like C/C++.

Check out:

  >>> def foo(a):
  ...     a = 2
  ...
  >>> value = 1
  >>> value
  1
  >>> foo(value)
  >>> value
  1
and:

  >>> def mutate(dct):
  ...     dct['foo'] = 'bar'
  ...
  >>> value = {}
  >>> value
  {}
  >>> mutate(value)
  >>> value
  {'foo': 'bar'}
This apparent contradiction confuses a lot of people. The first example would imply that Python's pass-by-value, but the second looks a lot like pass-by-reference. If you don't know the actual answer, it looks magical and inconsistent, and I've heard all sorts of explanations like "mutable arguments are passed by reference while immutable objects are passed by value".

In reality, the object itself - not the variable name referring to the object - is passed to a function arguments. In the first example we're passing in the object `int(1)`, not the variable `value`, and creating a new variable `a` to refer to it. When we then run `a = 2`, we're creating a new object `int(2)` and altering `a` to point to the new object instead of the old one. Nothing happens to the old `int(1)` object. It's still there, and the top-level `value` variable still points to it. `a` is just a symlink: it doesn't have a value of its own. Neither does `value` or any other Python variable name. That's why the second example works: we're passing in the actual dictionary object and then mutating it. We're not passing in the variable `value`; we're passing in the object that `value` refers to.

The point of this long-windedness is that Python's rules tend to be very, very simple and consistent. Its behavior can be unexpected if you don't truly understand the details or if you try to infer parallels to other languages by observing it and hoping you're right.

[+] noobiemcfoob|9 years ago|reply
In my experience and implied by the rising popularity of python, you would be among the minority. Personally, I find python to be the most clear of any language I've worked with, most resembling natural language in the way I typically speak. Do you have some examples of how you find it self-contradictory?

Here's an example of its expressiveness a colleague and mine I discussing the other day: Python: [os.remove(i.local_path) for i in old_q if i not in self.queue] Java: old_q.stream().filter(i -> !self.queue.contains(i)).map(i -> new Path(i.local_path)).forEach(Files::delete);

I've programmed in both languages but joked I could only understand the Java line by using the Python line as documentation!

[+] drauh|9 years ago|reply
I'm afraid I disagree. I programmed in a variety of languages in grad school (physics): C, C++, Fortran 77, Tcl, Perl, Matlab, Maple, Mathematica, IDL, Emacs LISP, etc. Not to mention the stuff I started on in high school.

When I switched my analysis to Python, I became so much more productive. And other science researchers I have known have echoed this sentiment. Even writing a C module to speed up my Python was pretty straightforward, if tedious.

Python had the fewest surprises. And debugging other people's Python is exponentially less annoying than debugging other people's Fortran or C. It's still my go-to language to get stuff done without fuss.

[+] acomjean|9 years ago|reply
I'm inclined to agree that python isn't the "clearest and friendliest". I've been using it a while, and I still find myself looking up how to do X, when it should be obvious. I like python, but I'm amazed at how people love it.

I have to maintain a codebase of php/perl/java/python. "Pythonistic" programming seems to encourage finding the shortest/fastest way to code things at the expense of clarity.

Plus dependencies can get headachy. This might just be the code I have to work with, but while better than perl, in my case its harder to maintain than java or php, (the global scope thing in python seems to get me).

[+] JoachimSchipper|9 years ago|reply
I usually find it fairly easy to figure out what Python code does... as long as no errors occur. The fact that basically no Python code documents what errors it can throw/generate is really annoying.
[+] metaphorm|9 years ago|reply
every language has warts and gotchas. I judge a language by how nice its _idiomatic_ patterns are, rather than by how bad its warts are (unless there are an overwhelming number of warts that can't be avoided even in idiomatic code).
[+] commenter23|9 years ago|reply
The point the article makes on comparing floating point values and the floating point type is true, but it's not because of any rounding error.

It's because the comparison operators are defined for every value. That is, "True < []" is valid in Python 2.7, along with any other 2 values, regardless of type. This is a surprising instance of weak typing in Python, which is otherwise strongly typed, which is why this was fixed in Python 3 (https://docs.python.org/3.0/whatsnew/3.0.html#ordering-compa...).

This is also not a case of Python doing something useful, like with '"foo"*2'. The result of the comparison is defined, but it's not useful. I suppose it was useful for making sure that you can always sort a list, but there are better ways to do that.

[+] viraptor|9 years ago|reply
> The point the article makes on comparing floating point values and the floating point type is true, but it's not because of any rounding error.

Do you mean this example? (it's the only one I can find about floating point comparison)

> 2.2 * 3.0 == 3.3 * 2.0

It's definitely due to accuracy error. (rather than type comparison) How would you explain it otherwise?

[+] Pikago|9 years ago|reply
The documentation of most modules cited in the article start with a paragraph in red and bold warning the reader of the same danger explained by the author. So this is a nice compilation, but nothing new and nothing somebody looking at the documention of the module he's using will miss.

There are nonetheless good remarks about poor design choices of Python which can lead to misconceptions to newbies, such as naming `input` the function that does `eval(raw_input(prompt))` (as casually documented[0]), and the existence of such function in a first place.

[0] https://docs.python.org/2/library/functions.html?highlight=i...

[+] santiagobasulto|9 years ago|reply
Completely out of context, sorry, but couldn't avoid to note this:

"Being easy to pick up and progress quickly towards developing larger and more complicated applications, Python is becoming increasingly ubiquitous in computing environments".

Why would you change the order of the subject in such an unreadable way? Isn't much easier to say:

"Python is becoming increasingly ubiquitous in computing environments, as it's easy to pick up and progress quickly towards developing larger and more complicated applications"

I'm not expert in writing, it just sounded weird. If anyone can explains what's going on there, really appreciated.

[+] amyjess|9 years ago|reply
On the float behavior: I really wish Python 3 had the sense to do what Perl 6 did and interpret all literals with decimal points (except those that use scientific notation) as Fractions instead of floats. That would solve all these floating-point errors without requiring significant modification of code, plus Python 3 would be the perfect time to do it because they're already throwing out backwards compatibility because of the str/bytes thing.
[+] mrswag|9 years ago|reply
Some points are valid, but come on, if an attacker has write access to your code, you can't recover from that, ever.
[+] guyzmo|9 years ago|reply
The part of the article about an issue with name mangling of private fields is somehow misleading.

The feature is just some syntactic sugar.

When within a class, private fields such as:

    class Foo:
        def __init__(self):
            self.__bar
are accessible from within other methods of class Foo as `self.__bar`. But that's just syntactic sugar, the real name of `self.__bar` is `self._Foo__bar`.

So from the outside "world", including `hasattr()`, you can still access `self.__bar` as `Foo()._Foo__bar`.

    >>> class Foo():
    ...   def __init__(self):
    ...     self.__bar = 'hello'
    ...   def show(self):
    ...     print(1, self.__bar)
    ...     print(2, getattr(self, '__bar'))
    ... 
    >>> foo = Foo()
    >>> foo._Foo__bar
    True
    >>> foo.show()
    1 hello
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "<stdin>", line 6, in show
    AttributeError: 'Foo' object has no attribute '__bar'
    >>> foo.__bar = 'world'
    >>> foo.show()
    1 hello
    2 world
In the end, when `x.__private` is setup outside of the class definition, obviously, it's a new member as its name differs from the internal name `__private` (which really is `_X__private`).

From within the code doing `getattr('X', '__private')` will return the `__private` setup from outside the class, and `getattr('X', '_X__private')` the one defined from within the class.

The whole point of that feature is to ensure that members defined within a class that are not part of the public API are left untouched when that class get subclassed, to avoid unexpected behaviours.

Here's an example of why this has been designed:

  >>> class A:
  ...   def __init__(self):
  ...     self.__internal = "this is a"
  ...   def show(self):
  ...     print(1, "A", self.__internal)
  ... 
  >>> class B(A):
  ...   def __init__(self):
  ...     super(B, self).__init__()
  ...     self.__internal = "this is b"
  ...   def show(self):
  ...     super(B, self).show()
  ...     print(2, "B", self._A__internal)
  ...     print(3, "B", self.__internal)
  ... 
  >>> B().show()
  1 A this is a
  2 B this is a
  3 B this is b
  >>> 
There's nothing that should be surprising or asymmetrical to anybody who've read the python documentation, and use that feature appropriately. It's maybe a weird feature, but it's still a coherent and homogeneous behaviour and actually adding more safety to codes.

Documentation references:

  * https://docs.python.org/3/faq/programming.html#i-try-to-use-spam-and-i-get-an-error-about-someclassname-spam
  * https://docs.python.org/3/reference/expressions.html#atom-identifiers
[+] mkesper|9 years ago|reply
"Private" fields and methods should use one underscore. Two underscores are for name mangling issues and __method__ is for "magic" methods.
[+] marklgr|9 years ago|reply
> The feature is just some syntactic sugar.

I would not call it "syntactic sugar", but rather a leak of implementation details. It could be deliberate, like Perl did for its OO (showing the entrails of all its objects), but it's not particularly sugary-sweet-yummy.

[+] xapata|9 years ago|reply
And name-mangling is less about privacy than about preventing accidental overrides.
[+] majewsky|9 years ago|reply
Two spaces indentation will give you the proper rendering for code snippets.
[+] aikah|9 years ago|reply
the last one (script injection) isn't limited to python but any language that make use of template engine. escaping variables should be the default behavior.

Now I like python, it has many useful libraries, in fact it is one of the language that has the most libraries for any purpose. I wish, even as a dynamically typed language, it was stricter sometimes though.

[+] wallunit|9 years ago|reply
I'm sorry but that whole article is just FUD...

> Input function

Yes, in Python 2, input() is a shortcut for eval(raw_input(...)), and documented as such. Obviously that is not a safe way to parse user input, and therefore it has been changed in Python 3. So this has been fixed, but if you don't read the documentation you probably will keep introducing security issues with whatever programming language.

> Assert statement

If you want to effectively protect against a certain condition, raise an exception! Asserts, on the other hand, exist to help debugging (and documenting) conditions that should never occur by proper API usage. Stripping debugging code when optimizing is common practice, not only with Python.

> Reusable integers

First of all, this behavior isn't part of the Python programming language, but an implementation detail, and a feature as it reduces memory footprint. But even when small integers wouldn't be cached, you would still have the same situation when using the is operator on variables holding the same int object. On the other hand, caching all integers could easily cause a notable memory leak, in particular considering that ints in Python 3 (like longs in Python 2) can be as large as memory available. But either way, there is no good reason to check for identify if you want to compare values, anyway.

> Floats comparison

floats in Python use essentially the native "double" type. Hence they have whatever precision, your CPU has for double precision floating point numbers, actually it is specified in IEEE 754. That way floating point numbers are reasonable fast, while as precise as in most other programming languages. However, if that still isn't enough for your use case, Python also comes with the decimal module (for fixed-point decimal numbers) and the fractions module (for infinite precision fractions).

And as for infinity, while one would expect float('infinity') to be larger than any numerical value, the result of comparing a numerical value with a non-numerical type is undefined. However, Python 3 is more strict and raises a TypeError.

> Private attributes

Class-private attributes (those starting with __) exist to avoid conflicts with class-private attributes of other classes in the class hierarchy, or similar accidents. From my experience that is a feature that is rarely needed, even more rarely in combination with getattr()/setattr()/delattr(). But if you need to dynamically lookup class-private attributes you can still do so like hastattr('_classname__attrname'). After all, self.__attrname is just syntactical sugar for self._classname__attrname.

Also note that private attributes aren't meant as a security mechanism, but merely to avoid accidents. That's not specific to Python; in most object-oriented languages it is possible to to access private attributes, one way or another. However, Python tries to be transparent about that fact, by keeping it simple.

> Module injection

Yes, Python looks in a few places for modules to be imported. That mechanism is quite useful for a couple of reasons, but most notably it's necessary to use modules without installing them system-wide. It can only become a security hole if a malicious user has write access to any location in sys.path, but not to the script, importing the modules, itself. I can hardly think about a scenario like that, and even then I'd rather blame the misconfiguration of the server.

> Code execution on import

Yes, just like every other script language, Python modules can execute arbitrary code on import. That is quite expected, necessary, and not limited to Python. Even if module injection is an issue, it doesn't make anything worse, as you you don't necessarily have to run malicious code on module import but could do it with whatever API is being called. But as outlined above, this is a rather theoretical scenario.

> Shell injection via subprocess

Yes, executing untrusted input, is insecure. That is why the functions in Python's subprocess module, by default, expect a sequence of arguments, rather than a string that is parsed by the system's shell. The documentation clearly explains the consequences of using shell=True. So introducing a shell injection vulnerability by accident, in Python, seems less likely than with most other programming languages.

> Temporary files

If anything, Python is as unsecure as the underlying system, and therefore as most other programming languages too. But CWE-377, the issue the author is talking about, isn't particular easy to exploit in a meaningful way, plus it requires the attacker to already have access to the local temporary directory. Moreover, Python's tempfile module encourages the use of high-level APIs that aren't effected.

> Templating engines

The reason jinja2 doesn't escape HTML markup by default is that it is not an HTML template engine, but a general purpose template engine, which is meant to generate any text-based format. Of course, it is highly recommended to turn on autoescaping when generating HTML/XML output. But enforcing autoescaping would break other formats.

[+] tehwalrus|9 years ago|reply
Having written code in Python for a few years, I've come across most of these (some of the ways to hack builtins/modify the code on a function reference were new to me).

However, it had also never occurred to me to make anything I cared about the security of in python. Perhaps this article is aimed at people who are writing system utilities for linux distributions, and are considering Python? Presumably some such utilities are written that way already.

It comes down to doing a proper security analysis before you define the requirements of the software: Specifically what attack vectors you want to defend against. A valid conclusion for some types of software, given the list of "bugs" in the post, would be don't write it in Python. (Indeed, I have done exactly this before writing 200 lines of C instead of 20 lines of Python.)

[+] sitkack|9 years ago|reply
So many people chiming in with their dismissive comments and superior Python knowledge. The article is excellent and should be required reading for Python devs. Having it in one place is valuable resource.
[+] swampthinker|9 years ago|reply
I love that the header and navbar is responsive, but the content itself is not.

Also input is truly baffling to me. Such a small mistake that could allow write access to your code.

[+] ekimekim|9 years ago|reply
it's historical - python was originally conceived as a toy language for teaching, in which context being able to do x = input(), type 2, and get an integer, is a desirable property.

Then we got stuck with it because backwards compatibility.

[+] baq|9 years ago|reply
one of those little things that python 3 fixed.
[+] dschiptsov|9 years ago|reply
"Reusable integers" is a real fail - it violates the principle of least surprise and introduces a nasty inconsistency - all integers should logically be (refer to) the same integer object, not just the first 100.

Assert is a statement, not an expression, so do not use it as an expression.

One should never compare floats. This is taught in any freshman CS course. The limitation is due to the standard encoding of floats - IEEE 754 - not Python's fault.

Everything else are features of a truly dynamic language, designed for a really quick prototyping. Python3.x got rid of many inconsistencies and caveats of 2.x

Shall we re-read the classic now?

https://eev.ee/blog/2012/04/09/php-a-fractal-of-bad-design/

[+] in_the_sticks|9 years ago|reply
If your code relies on two integers having the same object ID, I daresay you may be doing something wrong.
[+] kstrauser|9 years ago|reply
> "Reusable integers" is a real fail - it violates the principle of least surprise and introduces a nasty inconsistency - all integers should logically be (refer to) the same integer object, not just the first 100.

I find the concept of special-casing ints to behave that way to be surprising and inconsistent. If ints act that way, shouldn't strings? And if they (very much unexpectedly) did, why not every other type?

"is" is very useful on its own. "variable is None" is a common and powerful idiom entirely distinct from "variable == None". There are many cases when you want to compare object identity. None of those use cases apply to ints where "==" is always the correct way to compare them, so the fact that "a == b" and "a is b" might occasionally be the same or different doesn't affect anything at all in practice.

[+] ViktorasM|9 years ago|reply
Not readable on a phone. How can any tech company afford this in 2016...