top | item 7715349

Common Python Mistakes

404 points| djug | 12 years ago |toptal.com

141 comments

order
[+] agf|12 years ago|reply
This is a pretty good list of gotchas, but it's important when writing something targeted at beginners to be as precise and clear as possible. Nearly every section here either uses terminology poorly, is slightly incorrect, or has difficult examples.

  Python supports optional function arguments and allows default values to be 
  specified for any optional argument.
No, specifying a default is what causes an argument to be optional.

  it can lead to some confusion when specifying an expression as the default value
  for an optional function argument.
Anything you specify as a default value is an expression. The problem is when the default is mutable.

  the bar argument is initialized to its default (i.e., an empty list)
  only the first time that foo() is called
No, rather it's when the function is defined.

  class variables are internally handled as dictionaries
As dictionary keys, and that's still only roughly correct.

In "Common Mistake #5", he uses both a lambda and array index based looping, neither of which are particularly Pythonic. A better example of where this is a problem in otherwise Pythonic code would be good.

In "Common Mistake #6" he uses a lambda in a list comprehension -- for an article of mistakes mostly made by Python beginners, this is going to make it tough to follow the example.

In "Common Mistake #7", he describes "recursive imports" where he means "circular imports".

In "Common Mistake #8" he refers repeatedly to "stdlib" where he means the Python Standard Library. Someone is going to read that and try to "import stdlib".

[+] hsinger|12 years ago|reply
Hey, thanks for the great feedback! We agreed with (almost :-) ) all of your comments and have made corresponding mods/corrections to the post. Thanks again!

[Toptal blog editor]

[+] nine_k|12 years ago|reply
Just to notice: this text is not targeted at beginners, in my opinion. These are upper-intermediate to advanced level gotchas.
[+] Goosey|12 years ago|reply
Slightly off topic, but does anyone know of a resource that has 'most common mistakes' for different languages all in one place? It's certainly possible to google for blog posts and stack overflow questions to assemble such a list, but it would be handy to have them all in one place.

My use case is when interviewing candidates I often ask them to rate themselves on a scale of 1-5 in the languages they know, and then ask them increasingly 'tricky' questions in each language to get a feel for how their "personal" scale aligns to their real knowledge. This works fine if we have an overlap of several languages, but in the case where I know nothing or very little of one of the languages they know I lose that data point.

I find it valuable to know what a "I am a 1 at X" vs "I am a 3 X" vs "I am a 5 at X" means to them, since I've found little correlation between how harshly someone rates themselves and their true ability. Sometimes self-rated 5s are really 5s by my book, sometimes self-rated 3s are really 5s by my book, and sometimes self-rated 5s are really 2s by my book. So I want to know how "my scale" translates to "their scale". If it was more formalized I'd go as far as to get a "confidence quotient" for a person as self-critical and self-confident people can be fantastic engineers or horrible engineers.

Does anyone else do this process when interviewing?

[+] mark-r|12 years ago|reply
While such a resource would make your job easier, it would make the interviewee's job easier still. They'd just have to memorize all the points in the reference.
[+] toddkaufmann|12 years ago|reply
I have thought about this, but in the context of understanding the root causes of these problems: 1) is it a language design problem, 2) a misunderstanding or misconception on the part of the programmer, 3) due to or related to bad coding/smells (e.g. method body too long), 4) high complexity code (could be related to (3), could reflect the domain), 5) reduced programmer cognitive capacity (distraction, stress, sleep deficit, lack of motivation, etc.).

These would be interesting research areas for instrumenting IDEs / other eco-system tools to collect some of this data. (I'm sure there is already some work in some of these areas and would appreciate names or links to high-quality reviews.)

[+] famousactress|12 years ago|reply
This list is an excellent summary. If tasked with a #11 I'd probably add the slightly more obscure, but still super painful (when you do run into it) implicit string concatenation:

    >>> l = ["a",
    ...      "b",
    ...      "c"
    ...      "d"]
    >>> l
    ['a', 'b', 'cd']
[+] nine_k|12 years ago|reply
Still hugely useful when you want to write long string constants.

Imho, adding a comma after each list element is a good practice. You can easily swap them, add more, and never run into a an issue you describe:

    foo = [
      "a",
      "bc",
      "def",  # comma here, too
    ]
[+] yeukhon|12 years ago|reply
I never knew you could do that. You don't need the continuation though.

>>> l = ["A", "c""d"] >>> l ['A', 'cd']

[+] tomp|12 years ago|reply
Another one:

    class A():
      def __init__(self):
        self._x = 0
    
      @property
      def x(self):
        return self._x

      @x.setter
      def x(self, new_value):
        self._x = new_value
Using it:

    a = A()
    print a._x    # 0
    print a.x     # 0
    a.x = 4
    print a.x     # 4
    print a._x    # 0 wait WTF?!
The bug was not having `A` inherit from `object`. With old-style classes, properties do not work correctly.
[+] Spittie|12 years ago|reply
For anyone wondering, this works just fine in Python 3 :)
[+] Redoubts|12 years ago|reply
is

      def x(self, new_value):
        self._x = x
really what you mean? not _x = new_value?
[+] herge|12 years ago|reply
My pet peeve with python is the classic:

    return x,
I have never wanted to declare a tuple without surrounding it with (). Too bad it's not a syntax error in python 3.

Also, as opposed to one of his examples, if you are using python 2.7, declare your exception blocks as:

    except (FooException, BarException) as e:
It's forward compatible with python 3, it's easier to read and the syntax errors are clearer.
[+] icebraining|12 years ago|reply
I have never wanted to declare a tuple without surrounding it with ().

No? I do that occasionally, e.g.:

  x, y = 5, 6
[+] deckiedan|12 years ago|reply
You should be able to easily add an extra syntax rule to your editor or git pre-commit (/return.*,$/ type of thing if this bothers you too much...
[+] gejjaxxita|12 years ago|reply
#6 is really confusing. Whenever I encounter something like this my first reaction is that whenever possible such obscure components of a language should be avoided and more verbose/clear code used instead.

Programming languages are meant to be read as well as written, and someone relatively new to Python (and many who have used the language for a long time) is certain to get confused about the difference between:

   return [lambda x, i=i : i * x for i in range(5)]
and

   return [lambda x : i * x for i in range(5)]
[+] mguillech|12 years ago|reply
Agreed 100%, this type of constructions should be avoided in the first place in favor of more "readable" ones but this happens in a fair amount of code that I've seen (and I keep seeing).
[+] outworlder|12 years ago|reply
> "Python is an interpreted, object-oriented, high-level programming language with dynamic semantics."

I have an issue with that statement. No languages are inherently "compiled" or "interpreted", that's a property of the implementation.

If we are talking about CPython here, Python code is compiled to bytecode which is then interpreted. Not unlike Java - with the difference that the main implementation has a JIT and afaik, Python's does not.

But that's CPython. What about PyPy? It has a JIT.

[+] FigBug|12 years ago|reply
> No languages are inherently "compiled" or "interpreted", that's a property of the implementation.

A language and it's implementation are usually designed at the same time. Compiled or interpreted will affect design choices that go into the language. While additional implementations may follow, it can be hard/impossible to design a compiler (machine code, not byte code) for a language that was designed to be interpreted without dropping features (ie eval).

It may be more correct to say 'Python was designed to be interpreted' than 'Python is interpreted'

[+] baddox|12 years ago|reply
That's technically true, but I still think it's reasonable to casually refer to Python as "interpreted," since it conveys useful ideas, although those ideas are more accurately conveyed with different phrasing.

That said, for more precise discussions, what you pointed out is valid and important. One of the early questions I ask in a programming interview is to explain some high-level differences between two languages they're familiar with, which is often Java and Python. One of the common responses I get is that Java compiles to bytecode which is executed by a VM, while Python is interpreted. Of course, I point out that CPython is also compiled to bytecode and executed by a VM.

[+] metaphorm|12 years ago|reply
I think its reasonable to refer to the reference implementation of Python (CPython) as just "Python". The compilation to bytecode is an intermediate step because Python specifies a virtual machine. The language is definitely interpreted though. Its completely accurate to call Python an interpreted language. There's no version of Python that I know of (including PyPy) that ever produces compiled machine executable binaries prior to runtime.
[+] mark-r|12 years ago|reply
I've always thought that #1 is a sign of an incorrect operation altogether. If you want to always modify the passed parameter, it doesn't make sense to have a default. If you want to return a modified version of the input, you should make a copy immediately and then you don't get this problem. Doing both an in-place modification and returning a modified object at the same time is just wrong.
[+] ajanuary|12 years ago|reply
A slightly more realistic example:

    class Bag(object):
        def __init__(self, items=[]):
            self.items = items

        def add_item(self, item):
            # check the item is valid
            self.items.append(item)

    bag1 = Bag()
    bag1.add('an item')

    bag2 = Bag()
    print(bag2.items)
[+] Hovertruck|12 years ago|reply
"Thus, the bar argument is initialized to its default (i.e., an empty list) only the first time that foo() is called, but then subsequent calls to foo() (i.e., without a bar argument specified) will continue to use the same list to which bar was originally initialized."

This actually happens when the function is defined, not when it's called the first time.

[+] codezero|12 years ago|reply
OT but this page hard crashes Safari on iPhone.
[+] peter_l_downs|12 years ago|reply
Also crashes Safari on my OS 10.5 Mac, and is unusably laggy in Firefox on the same computer. All sorts of thrashy javascript nonsense seems to be going on.
[+] japaget|12 years ago|reply
It crashes with Chrome also, but I was able to read the page with Opera Mini. iPhone 5, iOS 7.1.
[+] jordigh|12 years ago|reply
File a bug report with Apple? No input should crash a program.
[+] mctx|12 years ago|reply
Also crashes chrome on ios 7.1.1 on an iPhone 5S every time
[+] cefstat|12 years ago|reply
I have been bitten by #6 in a similar situation in the past. My solution was the analogue of the rather convoluted

    def create_multipliers():
      def multiplier(i):
        return lambda x: i*x
      return [multiplier(i) for i in range(5)]

    for multiplier in create_multipliers():
      print multiplier(2)
I would still prefer that Python doesn't do this.
[+] vram22|12 years ago|reply
In "Common Mistake #2", I'd say that the mistake is fairly obvious to anyone who understands even a little bit about OOP and inheritance. Since class C doesn't define its own variable x, it has to be that it inherits the x in class A, so there's no reason to be surprised that C.x changes when A.x does.
[+] evincarofautumn|12 years ago|reply
Here’s a thought experiment for you—think of “common mistakes in language X” as “design flaws in language X” or “ways in which language X is surprising” and what could have been done to mitigate that.
[+] pekk|12 years ago|reply
Whether you choose 0-based indexing or 1-based indexing, somebody is going to be confused about indexing sometime.
[+] udioron|12 years ago|reply
Regarding circular imports and #7: The main problem in arises when using the from mymodule import mysymbol notion.

The example solved this by properly using import mymodule, although this might cause some more problem if your design is wrong, as see in the example. Calling f() from the module ("library") code itself is a very bad idea. Instead one should do this:

a.py:

    import b

    def f():
        return b.x
b.py:

    import a

    x = 1

    def g():
        print a.f()
main.py:

    import a

    a.f()
[+] michaelmior|12 years ago|reply
For the first gotcha, using None as a default argument solves the problem, but checking `if not bar` instead of `if bar is None` can produce different results if bar evaluates to None in a boolean context.

    >>> def foo(bar=None):
    ...    if not bar:
    ...        bar = []
    ...    bar.append("baz")
    ...    return bar
    ...
    >>> bar = []
    >>> foo(bar)
    ["baz"]
    >>> bar
    []
[+] icebraining|12 years ago|reply
True, but foo shouldn't be modifying bar anyway; make a copy instead:

  bar = list(bar) if bar else []
[+] dmritard96|12 years ago|reply
There are more than a few interesting points in here but this is funny to me coming from someone who is seemingly well versed in python: Mistake #5

numbers = [n for n in range(10)]

this should be: range(10)

[+] dragonwriter|12 years ago|reply
Unless its Python 3.x.

Though, really, even there, while the list comprehension works, its kind of an awkward construction to use that instead of:

  numbers = list(range(10))
[+] abaschin|12 years ago|reply
not in Python 3, where range() replaces xrange()
[+] ygra|12 years ago|reply
And I would have thought that incorrect usage of bytestrings for text and then asking on Stack Overflow about the UnicodeDecodeErrors would be quite common as well ...
[+] gtaylor|12 years ago|reply
Yeah, bigtime. I'm not sure that would make for a quick, easy countdown list of a blog post, though.
[+] cridenour|12 years ago|reply
For #7, now you have a performance problem of importing every time you run that function. Rather, you can place the import at the bottom of b.py and be okay.
[+] tom_jones|12 years ago|reply
Any reason why you're using a slice here? >>> numbers[:] = [n for n in numbers if not odd(n)] I'm thinking that doing >>> numbers = [n for n in numbers if not odd(n)] wouldn't be a problem since the assignment is executed after the computation of the list comprehension.
[+] andreif|12 years ago|reply
I have just been asked by a colleague about yet another gotcha when you have for example:

    # mypackage/__init__.py
    from .settings import settings
and when trying to import settings from mypackage

    from mypackage import settings
you get module instead of settings object.
[+] kyro|12 years ago|reply
from mypackage.settings import settings