This is a pretty good list of gotchas, but it's important when writing something targeted at beginners to be as precise and clear as possible. Nearly every section here either uses terminology poorly, is slightly incorrect, or has difficult examples.
Python supports optional function arguments and allows default values to be
specified for any optional argument.
No, specifying a default is what causes an argument to be optional.
it can lead to some confusion when specifying an expression as the default value
for an optional function argument.
Anything you specify as a default value is an expression. The problem is when the default is mutable.
the bar argument is initialized to its default (i.e., an empty list)
only the first time that foo() is called
No, rather it's when the function is defined.
class variables are internally handled as dictionaries
As dictionary keys, and that's still only roughly correct.
In "Common Mistake #5", he uses both a lambda and array index based looping, neither of which are particularly Pythonic. A better example of where this is a problem in otherwise Pythonic code would be good.
In "Common Mistake #6" he uses a lambda in a list comprehension -- for an article of mistakes mostly made by Python beginners, this is going to make it tough to follow the example.
In "Common Mistake #7", he describes "recursive imports" where he means "circular imports".
In "Common Mistake #8" he refers repeatedly to "stdlib" where he means the Python Standard Library. Someone is going to read that and try to "import stdlib".
Hey, thanks for the great feedback! We agreed with (almost :-) ) all of your comments and have made corresponding mods/corrections to the post. Thanks again!
Slightly off topic, but does anyone know of a resource that has 'most common mistakes' for different languages all in one place? It's certainly possible to google for blog posts and stack overflow questions to assemble such a list, but it would be handy to have them all in one place.
My use case is when interviewing candidates I often ask them to rate themselves on a scale of 1-5 in the languages they know, and then ask them increasingly 'tricky' questions in each language to get a feel for how their "personal" scale aligns to their real knowledge. This works fine if we have an overlap of several languages, but in the case where I know nothing or very little of one of the languages they know I lose that data point.
I find it valuable to know what a "I am a 1 at X" vs "I am a 3 X" vs "I am a 5 at X" means to them, since I've found little correlation between how harshly someone rates themselves and their true ability. Sometimes self-rated 5s are really 5s by my book, sometimes self-rated 3s are really 5s by my book, and sometimes self-rated 5s are really 2s by my book. So I want to know how "my scale" translates to "their scale". If it was more formalized I'd go as far as to get a "confidence quotient" for a person as self-critical and self-confident people can be fantastic engineers or horrible engineers.
Does anyone else do this process when interviewing?
While such a resource would make your job easier, it would make the interviewee's job easier still. They'd just have to memorize all the points in the reference.
I have thought about this, but in the context of understanding the root causes of these problems:
1) is it a language design problem,
2) a misunderstanding or misconception on the part of the programmer,
3) due to or related to bad coding/smells (e.g. method body too long),
4) high complexity code (could be related to (3), could reflect the domain),
5) reduced programmer cognitive capacity (distraction, stress, sleep deficit, lack of motivation, etc.).
These would be interesting research areas for instrumenting IDEs / other eco-system tools to collect some of this data. (I'm sure there is already some work in some of these areas and would appreciate names or links to high-quality reviews.)
This list is an excellent summary. If tasked with a #11 I'd probably add the slightly more obscure, but still super painful (when you do run into it) implicit string concatenation:
>>> l = ["a",
... "b",
... "c"
... "d"]
>>> l
['a', 'b', 'cd']
#6 is really confusing. Whenever I encounter something like this my first reaction is that whenever possible such obscure components of a language should be avoided and more verbose/clear code used instead.
Programming languages are meant to be read as well as written, and someone relatively new to Python (and many who have used the language for a long time) is certain to get confused about the difference between:
Agreed 100%, this type of constructions should be avoided in the first place in favor of more "readable" ones but this happens in a fair amount of code that I've seen (and I keep seeing).
> "Python is an interpreted, object-oriented, high-level programming language with dynamic semantics."
I have an issue with that statement. No languages are inherently "compiled" or "interpreted", that's a property of the implementation.
If we are talking about CPython here, Python code is compiled to bytecode which is then interpreted. Not unlike Java - with the difference that the main implementation has a JIT and afaik, Python's does not.
But that's CPython. What about PyPy? It has a JIT.
> No languages are inherently "compiled" or "interpreted", that's a property of the implementation.
A language and it's implementation are usually designed at the same time. Compiled or interpreted will affect design choices that go into the language. While additional implementations may follow, it can be hard/impossible to design a compiler (machine code, not byte code) for a language that was designed to be interpreted without dropping features (ie eval).
It may be more correct to say 'Python was designed to be interpreted' than 'Python is interpreted'
That's technically true, but I still think it's reasonable to casually refer to Python as "interpreted," since it conveys useful ideas, although those ideas are more accurately conveyed with different phrasing.
That said, for more precise discussions, what you pointed out is valid and important. One of the early questions I ask in a programming interview is to explain some high-level differences between two languages they're familiar with, which is often Java and Python. One of the common responses I get is that Java compiles to bytecode which is executed by a VM, while Python is interpreted. Of course, I point out that CPython is also compiled to bytecode and executed by a VM.
I think its reasonable to refer to the reference implementation of Python (CPython) as just "Python". The compilation to bytecode is an intermediate step because Python specifies a virtual machine. The language is definitely interpreted though. Its completely accurate to call Python an interpreted language. There's no version of Python that I know of (including PyPy) that ever produces compiled machine executable binaries prior to runtime.
I've always thought that #1 is a sign of an incorrect operation altogether. If you want to always modify the passed parameter, it doesn't make sense to have a default. If you want to return a modified version of the input, you should make a copy immediately and then you don't get this problem. Doing both an in-place modification and returning a modified object at the same time is just wrong.
"Thus, the bar argument is initialized to its default (i.e., an empty list) only the first time that foo() is called, but then subsequent calls to foo() (i.e., without a bar argument specified) will continue to use the same list to which bar was originally initialized."
This actually happens when the function is defined, not when it's called the first time.
Also crashes Safari on my OS 10.5 Mac, and is unusably laggy in Firefox on the same computer. All sorts of thrashy javascript nonsense seems to be going on.
I have been bitten by #6 in a similar situation in the past. My solution was the analogue of the rather convoluted
def create_multipliers():
def multiplier(i):
return lambda x: i*x
return [multiplier(i) for i in range(5)]
for multiplier in create_multipliers():
print multiplier(2)
In "Common Mistake #2", I'd say that the mistake is fairly obvious to anyone who understands even a little bit about OOP and inheritance. Since class C doesn't define its own variable x, it has to be that it inherits the x in class A, so there's no reason to be surprised that C.x changes when A.x does.
Here’s a thought experiment for you—think of “common mistakes in language X” as “design flaws in language X” or “ways in which language X is surprising” and what could have been done to mitigate that.
Regarding circular imports and #7:
The main problem in arises when using the from mymodule import mysymbol notion.
The example solved this by properly using import mymodule, although this might cause some more problem if your design is wrong, as see in the example. Calling f() from the module ("library") code itself is a very bad idea. Instead one should do this:
For the first gotcha, using None as a default argument solves the problem, but checking `if not bar` instead of `if bar is None` can produce different results if bar evaluates to None in a boolean context.
>>> def foo(bar=None):
... if not bar:
... bar = []
... bar.append("baz")
... return bar
...
>>> bar = []
>>> foo(bar)
["baz"]
>>> bar
[]
And I would have thought that incorrect usage of bytestrings for text and then asking on Stack Overflow about the UnicodeDecodeErrors would be quite common as well ...
For #7, now you have a performance problem of importing every time you run that function. Rather, you can place the import at the bottom of b.py and be okay.
Any reason why you're using a slice here?
>>> numbers[:] = [n for n in numbers if not odd(n)]
I'm thinking that doing
>>> numbers = [n for n in numbers if not odd(n)]
wouldn't be a problem since the assignment is executed after the computation of the list comprehension.
[+] [-] agf|12 years ago|reply
In "Common Mistake #5", he uses both a lambda and array index based looping, neither of which are particularly Pythonic. A better example of where this is a problem in otherwise Pythonic code would be good.
In "Common Mistake #6" he uses a lambda in a list comprehension -- for an article of mistakes mostly made by Python beginners, this is going to make it tough to follow the example.
In "Common Mistake #7", he describes "recursive imports" where he means "circular imports".
In "Common Mistake #8" he refers repeatedly to "stdlib" where he means the Python Standard Library. Someone is going to read that and try to "import stdlib".
[+] [-] hsinger|12 years ago|reply
[Toptal blog editor]
[+] [-] nine_k|12 years ago|reply
[+] [-] Goosey|12 years ago|reply
My use case is when interviewing candidates I often ask them to rate themselves on a scale of 1-5 in the languages they know, and then ask them increasingly 'tricky' questions in each language to get a feel for how their "personal" scale aligns to their real knowledge. This works fine if we have an overlap of several languages, but in the case where I know nothing or very little of one of the languages they know I lose that data point.
I find it valuable to know what a "I am a 1 at X" vs "I am a 3 X" vs "I am a 5 at X" means to them, since I've found little correlation between how harshly someone rates themselves and their true ability. Sometimes self-rated 5s are really 5s by my book, sometimes self-rated 3s are really 5s by my book, and sometimes self-rated 5s are really 2s by my book. So I want to know how "my scale" translates to "their scale". If it was more formalized I'd go as far as to get a "confidence quotient" for a person as self-critical and self-confident people can be fantastic engineers or horrible engineers.
Does anyone else do this process when interviewing?
[+] [-] mark-r|12 years ago|reply
[+] [-] toddkaufmann|12 years ago|reply
These would be interesting research areas for instrumenting IDEs / other eco-system tools to collect some of this data. (I'm sure there is already some work in some of these areas and would appreciate names or links to high-quality reviews.)
[+] [-] whatevsbro|12 years ago|reply
[deleted]
[+] [-] famousactress|12 years ago|reply
[+] [-] nine_k|12 years ago|reply
Imho, adding a comma after each list element is a good practice. You can easily swap them, add more, and never run into a an issue you describe:
[+] [-] yeukhon|12 years ago|reply
>>> l = ["A", "c""d"] >>> l ['A', 'cd']
[+] [-] tomp|12 years ago|reply
[+] [-] Spittie|12 years ago|reply
[+] [-] Redoubts|12 years ago|reply
[+] [-] herge|12 years ago|reply
Also, as opposed to one of his examples, if you are using python 2.7, declare your exception blocks as:
It's forward compatible with python 3, it's easier to read and the syntax errors are clearer.[+] [-] icebraining|12 years ago|reply
No? I do that occasionally, e.g.:
[+] [-] deckiedan|12 years ago|reply
[+] [-] gejjaxxita|12 years ago|reply
Programming languages are meant to be read as well as written, and someone relatively new to Python (and many who have used the language for a long time) is certain to get confused about the difference between:
and[+] [-] mguillech|12 years ago|reply
[+] [-] outworlder|12 years ago|reply
I have an issue with that statement. No languages are inherently "compiled" or "interpreted", that's a property of the implementation.
If we are talking about CPython here, Python code is compiled to bytecode which is then interpreted. Not unlike Java - with the difference that the main implementation has a JIT and afaik, Python's does not.
But that's CPython. What about PyPy? It has a JIT.
[+] [-] FigBug|12 years ago|reply
A language and it's implementation are usually designed at the same time. Compiled or interpreted will affect design choices that go into the language. While additional implementations may follow, it can be hard/impossible to design a compiler (machine code, not byte code) for a language that was designed to be interpreted without dropping features (ie eval).
It may be more correct to say 'Python was designed to be interpreted' than 'Python is interpreted'
[+] [-] baddox|12 years ago|reply
That said, for more precise discussions, what you pointed out is valid and important. One of the early questions I ask in a programming interview is to explain some high-level differences between two languages they're familiar with, which is often Java and Python. One of the common responses I get is that Java compiles to bytecode which is executed by a VM, while Python is interpreted. Of course, I point out that CPython is also compiled to bytecode and executed by a VM.
[+] [-] metaphorm|12 years ago|reply
[+] [-] mark-r|12 years ago|reply
[+] [-] ajanuary|12 years ago|reply
[+] [-] Hovertruck|12 years ago|reply
This actually happens when the function is defined, not when it's called the first time.
[+] [-] codezero|12 years ago|reply
[+] [-] peter_l_downs|12 years ago|reply
[+] [-] japaget|12 years ago|reply
[+] [-] jordigh|12 years ago|reply
[+] [-] brown9-2|12 years ago|reply
[+] [-] mctx|12 years ago|reply
[+] [-] cefstat|12 years ago|reply
[+] [-] vram22|12 years ago|reply
[+] [-] evincarofautumn|12 years ago|reply
[+] [-] pekk|12 years ago|reply
[+] [-] udioron|12 years ago|reply
The example solved this by properly using import mymodule, although this might cause some more problem if your design is wrong, as see in the example. Calling f() from the module ("library") code itself is a very bad idea. Instead one should do this:
a.py:
b.py: main.py:[+] [-] michaelmior|12 years ago|reply
[+] [-] icebraining|12 years ago|reply
[+] [-] dmritard96|12 years ago|reply
numbers = [n for n in range(10)]
this should be: range(10)
[+] [-] dragonwriter|12 years ago|reply
Though, really, even there, while the list comprehension works, its kind of an awkward construction to use that instead of:
[+] [-] abaschin|12 years ago|reply
[+] [-] ygra|12 years ago|reply
[+] [-] gtaylor|12 years ago|reply
[+] [-] cridenour|12 years ago|reply
[+] [-] tom_jones|12 years ago|reply
[+] [-] andreif|12 years ago|reply
[+] [-] kyro|12 years ago|reply