I disagree with the implications. The main reasons Python is easier to use is independent of type system. Not having to manually manage memory, for example! Overall it is a higher-level language, whereas C is designed for maximum performance.
Dynamic typing is probably preferable to a 40 year old type system. But Python could be easier to use (catch bugs ahead of time!) and execute faster by adding a modern type system. Optional typing (like TypeScript or Facebook's new PHP implemntation) would probably be appropriate.
>> Dynamic typing makes Python easier to use than C
> Overall it is a higher-level language, whereas C is designed for maximum performance.
I agree with most of your points, but on this I think you are comparing apples to oranges. C is not designed for maximum performance, but as a system language and as such only as slim abstraction over assembler. As a side effect it allows for maximum performance. A more fair comparison (pretty much every widely used language is easier to use than C) would be C++ vs. Python, where in C++ most - if not all - of the higher level abstractions were explicitly designed not to worsen performance compared to low level code. And in my opinion Python is so much easier to use than C++ because it values simplicity and consistency more than efficiency.
For example the general loop constructions in Python which are possible because of nice generators and the wonderfully straightforward `yield` construction are unacceptable in C++ as they lack the flexibility to be as efficient as possible in every case.
The advantage of dynamic typing only becomes apparent in a third argumentative step; when you try build a language with consistent and efficient abstraction and you end up with something like Haskell whose type system complexity is huge entry barrier. That's why I think dynamic typing is good for Python.
I'd love to see optional typing in Python, I wonder if there is any official reasons about why it never got introduced.
I very rarely change the type of a variable, so it would be essentially free speed and free speeds for me.
I actually wonder, do people use dynamic typing so often? I mean, it's nice to do "variable = int(variable)" when I know that I'm getting an integer in a string, but that's probably the only use case can I can think off that doesn't just reuse variables for something else.
Cython[1] is the closest I'm aware of to optional type annotation for Python. It works quite well, although requires a separate compilation step and the source code is no longer Python.
I assume by "new PHP implementation", you're referring to Hack[2] which is a different language, although heavily based on PHP. (I believe Hack is a superset?)
The bigger picture is that CPython's core team simply does not care too much about performance. Performance has never been a fundamental requirement, but merely an afterthought.
The biggest reminder of this was Python 3, which for me was a complete disappointment. They could have limited python's super-dynamic behavior (e.g. changing builtin functions, patching classes on the fly, etc.) or made them optional. They could've added optional typing annotations a la Cython. Or even changing the builtins and language syntax to allow more inplace operations and preallocations, so that temporary results wouldn't have to be allocated on the heap over and over again. All of these changes would have made python faster and more JIT-able. None of these things happened. Performance-wise, python 3 is no step forward.
Python+Cython is still a powerful combination, but eventually Julia or similar languages will eat python's lunch with respect to scientific computing.
One thing that bothers me - and has for a long time - is why Python (Perl, Ruby, etc), never have really leveraged the work Common Lisp systems have done (CMUCL, SBCL, etc), which provide very good performance without sacrificing dynamic typing or the REPL.
Dynamically typed does not imply no types. A good CL implementation takes that to heart and produces really nice code out of the box. And the language spec provides for an API to allow program developers to specify types and hints to the compiler to optimize certain functions and code paths where it's needed.
The benefit from this is that you can rapidly prototype your application and harden it to the platform you deploy on when performance becomes an issue... and with built-in conditional compilation you can do that in a cross-platform way.
Languages like Python and Ruby have generally looked to the SmallTalk and Self literature (inline caches, dynamic deoptimization, hidden classes etc) rather than Common Lisp, so people aren't ignorant of the research, they're just looking in a different area. If you think there's something in Common Lisp we're missing, point it out!
And the easiest way to leverage that effort is to use CL as an implementation platform. Whatever you think of CL as a language to program in directly, it really shines as a language to implement dynamic languages in. You get a lot of useful pieces, including native-code compilation, some very powerful control-flow operators, multithreading (in most implementations), and of course garbage collection, for free.
There's a more subtle benefit as well. Because compilability was a key goal in the design of CL, using it as a translation target while you're designing a new dynamic language will help you make sure your language is also efficiently compilable.
EDITED to add: in CL, the whole tagged-pointer thing has been taken care of for you, so you don't have to think about it (cf. [0]). This means you get fixnums (unboxed immediate integers), with automatic overflow to bignums (arbitrary-precision integers), for free. In fact there's the whole CL numeric tower, including ratios and complex numbers. Other goodies: multiple inheritance. Multimethods (multiple dispatch). Class creation and redefinition at runtime.
It baffles me that anyone would want to reimplement all that stuff, or even part of it.
Python is a very different language than Lisp. It's much more dynamic. Every method call conceptually involves a hash table look up. Lisp on the other hand is much closer to the bare metal, and much easier to implement efficiently. For Python the implementation techniques of Self would be more appropriate.
Python (and Ruby for that matter) are slow because there are no billion-dollar companies stuck with an initial choice of language that impedes their ability to grow. PHP and Javascript used to be extremely slow and now after several dozens of millions thrown at JIT, rewrites, forks and redesigns they're starting to get much much faster.
Not so. Google uses Python extensively in-house (it's one of the four "blessed" languages) and, in fact, employed GVR until he was recently hired away by Dropbox--another billion-dollar company which relies heavily on Python. At one point Google even spearheaded an ambitious effort (https://code.google.com/p/unladen-swallow/) to make CPython much faster. It failed.
At the point where even Google can't make it happen, it really starts to look like Python performance is limited at some very fundamental level to what we see today. Personally I think this is fine. I use Python for everything day-to-day and offloading the intensive stuff to a C extension (a la Numpy) works just great. There are very few instances where I find myself wishing Python was faster.
Yes, Python is slower in execution than some other languages, but:
... efficient use of development time ...
That is the reason, why Python counts (not only references). Python has many very good libraries, is a good OOP language, easy to learn, but still very, very powerful. You can express in Python some things in a single line, where you need hundreds of lines in C++ or other languages.
The few percent of running speed that you might loose, are neglect-able in most cases against the win in development speed.
In many applications you don't need the full CPU power, but often times you are hindered by e.g. the disk speed or other factors ... and than you don't lose anything when you are a little bit slower in some minor tasks.
And they pretty much say that there is only one method that's been overridden from the normal dict class. And all it does is it executes the anonymous method (lamda) you supplied instead of returning the default value you pass in.
Also, this depends on your usage, and what you have in the lambda expression. I'm curious about your code, care to paste/link some of it for us?
If you are predominately adding entries to a dictionary, the standard dict will be faster.
Where defaultdict shines is the case where you hit an entry a bunch of times, but you'd like to automate initializing an entry the first time it is hit.
The most common example is probably the frequency distribution, where the frequency needs to start out zero and be incremented each time a key is seen. Using int or lambda:0 as defaultdict's argument works fine for the simple case. If you want to tabulate more than just the count, e.g. a count and one or more sums, you can pass a constructor that produces an object with an update method, to which you pass whatever your key represents.
This is convenient, since it's almost exactly what I plan to talk about in the undergrad class I'm teaching today (https://github.com/williamstein/sage2014). I view this sort of article as good motivation for Cython, and the value of this article is merely that it aims at typical undergraduates not in computer science.
Good read. Whenever I read stuff like this, I always wonder if it is always true that dynamically typed languages are slower than statically typed languages ? Also, do we have to take this for granted that more higher level the language is, the slower it will be ? Or are there exceptions ?
Also, it is worth asking if for majority of use cases of python for data/analysis, the ease and flexibility outweights the slowness.
Always is a bit of a strong word but yes, it's always true.
Virtual functions in C++ which allow some form of dynamic behaviour are slower than static function calls because they inherently involve another level of indirection. Static calls are known at compile time, they can be inlined by the compiler, they can be optimized in the context in which they're called. Now, nothing prevents the C++ run-time from trying to do the same thing at run-time but you can relatively easily see that it'll have to make some other compromised to do so. Nothing prevents a C++ program from generating C++ code at run-time, compiling it, and loading it into the current process as a .so. Now that's a pretty dynamic behaviour but there's again an obvious price. You can also write self modifying code. At any rate, static languages are capable of the same dynamic behaviour that dynamic languages are capable of but you often have to implement that behaviour yourself (or embed an interpreter...).
Fundamentally, a dynamic language can't make the kinds of assumptions a more static language can make, it can try and determine things at run-time (ala JIT) but those take time and still have to adapt to the dynamics of the language. The same line of code "a = b + c" in Python can mean something completely different every time it's executed so the run-time has to figure out what the types are an invoke the right code. Now the real problem is that if you take advantage of that then no one can actually tell what this code is doing and it is utterly unmaintainable.
To compound the problems facing dynamic languages is the fact that CPUs are optimized for executing "predictable" code. When your language is dynamic there are more dependencies in the instruction sequence and things like branch prediction may become more difficult. It also doesn't help that some of the dynamic languages we're discussing have poor locality in memory (that's an orthogonal issue though, you could give a dynamic language much better control over memory).
EDIT: One would think it should be possible to design a language which has both dynamic and static features where if you restrict yourself to the static portion runs just as fast as any other statically compiled language and still allows to switch to more dynamic concepts and pay the price when you do that.
The 'slow' part wasn't so new to me, but the `id` command and the attendant understanding that smply typing 110 in the interpreter creates an OBJECT and when you assign a=110 `a` points at that object (and when you reassign a=30, a new object is created and a points at that), blew my mind. Thanks for this!
Previously I had thought that doing a=110 creates an object 'a' that stores the value 110 (and when we do b=a, b simply points to a). I had no idea there is a third object in play.
Actually, it's a bit more complicated: for small integers, the Python runtime has a stash of pre-made objects. For compiled code, the code may include a static reference to a constant object.
Which means that, if you execute "a = 110" a bunch of times, no new objects get created. If you execute "a= 110+300" a bunch of times, a whole bunch of new objects is created and destroyed.
By the way, this boxing/unboxing logic is also at work when you stick numbers into a standard HashMap, since it stores Integer objects and not the primitive type. However, in Java you can use GNU Trove's IntObjectHashMap or ObjectIntHashMap and save some memory, whereas Python's "everything is an object" rules this out (unless you're using PyPy and its tracing jit which figures out that a always contains a number).
That's pretty central to any OO language. "Everything's an object." Veering away from that, only serves to turn the language into a mess of boxing and unboxing.
That's not to say that OO is necessarily the ideal language paradigm, but it has certainly been the most dominant in the era Python has existed.
[+] [-] gregwebs|12 years ago|reply
Dynamic typing is probably preferable to a 40 year old type system. But Python could be easier to use (catch bugs ahead of time!) and execute faster by adding a modern type system. Optional typing (like TypeScript or Facebook's new PHP implemntation) would probably be appropriate.
[+] [-] Perseids|12 years ago|reply
> Overall it is a higher-level language, whereas C is designed for maximum performance.
I agree with most of your points, but on this I think you are comparing apples to oranges. C is not designed for maximum performance, but as a system language and as such only as slim abstraction over assembler. As a side effect it allows for maximum performance. A more fair comparison (pretty much every widely used language is easier to use than C) would be C++ vs. Python, where in C++ most - if not all - of the higher level abstractions were explicitly designed not to worsen performance compared to low level code. And in my opinion Python is so much easier to use than C++ because it values simplicity and consistency more than efficiency.
For example the general loop constructions in Python which are possible because of nice generators and the wonderfully straightforward `yield` construction are unacceptable in C++ as they lack the flexibility to be as efficient as possible in every case.
The advantage of dynamic typing only becomes apparent in a third argumentative step; when you try build a language with consistent and efficient abstraction and you end up with something like Haskell whose type system complexity is huge entry barrier. That's why I think dynamic typing is good for Python.
[+] [-] Spittie|12 years ago|reply
I very rarely change the type of a variable, so it would be essentially free speed and free speeds for me.
I actually wonder, do people use dynamic typing so often? I mean, it's nice to do "variable = int(variable)" when I know that I'm getting an integer in a string, but that's probably the only use case can I can think off that doesn't just reuse variables for something else.
[+] [-] chriswarbo|12 years ago|reply
I don't think ML users will agree ;)
[+] [-] michaelmior|12 years ago|reply
I assume by "new PHP implementation", you're referring to Hack[2] which is a different language, although heavily based on PHP. (I believe Hack is a superset?)
[1] http://cython.org/ [2] http://hacklang.org/
[+] [-] benselme|12 years ago|reply
[+] [-] sgt101|12 years ago|reply
[+] [-] nkozyra|12 years ago|reply
At the same time I hate it when syntax vacillates wildly from application to application.
[+] [-] lispm|12 years ago|reply
It wouldn't make Python easier to use. Just the opposite.
[+] [-] shocks|12 years ago|reply
[+] [-] amit_m|12 years ago|reply
The biggest reminder of this was Python 3, which for me was a complete disappointment. They could have limited python's super-dynamic behavior (e.g. changing builtin functions, patching classes on the fly, etc.) or made them optional. They could've added optional typing annotations a la Cython. Or even changing the builtins and language syntax to allow more inplace operations and preallocations, so that temporary results wouldn't have to be allocated on the heap over and over again. All of these changes would have made python faster and more JIT-able. None of these things happened. Performance-wise, python 3 is no step forward.
Python+Cython is still a powerful combination, but eventually Julia or similar languages will eat python's lunch with respect to scientific computing.
[+] [-] GFK_of_xmaspast|12 years ago|reply
[+] [-] est|12 years ago|reply
[+] [-] walshemj|12 years ago|reply
[+] [-] pnathan|12 years ago|reply
[+] [-] agentultra|12 years ago|reply
Dynamically typed does not imply no types. A good CL implementation takes that to heart and produces really nice code out of the box. And the language spec provides for an API to allow program developers to specify types and hints to the compiler to optimize certain functions and code paths where it's needed.
The benefit from this is that you can rapidly prototype your application and harden it to the platform you deploy on when performance becomes an issue... and with built-in conditional compilation you can do that in a cross-platform way.
[+] [-] chrisseaton|12 years ago|reply
[+] [-] ScottBurson|12 years ago|reply
There's a more subtle benefit as well. Because compilability was a key goal in the design of CL, using it as a translation target while you're designing a new dynamic language will help you make sure your language is also efficiently compilable.
EDITED to add: in CL, the whole tagged-pointer thing has been taken care of for you, so you don't have to think about it (cf. [0]). This means you get fixnums (unboxed immediate integers), with automatic overflow to bignums (arbitrary-precision integers), for free. In fact there's the whole CL numeric tower, including ratios and complex numbers. Other goodies: multiple inheritance. Multimethods (multiple dispatch). Class creation and redefinition at runtime.
It baffles me that anyone would want to reimplement all that stuff, or even part of it.
[0] https://mail.python.org/pipermail/python-dev/2004-July/04614...
[+] [-] jules|12 years ago|reply
[+] [-] WaxProlix|12 years ago|reply
[+] [-] VeejayRampay|12 years ago|reply
[+] [-] hyperbovine|12 years ago|reply
At the point where even Google can't make it happen, it really starts to look like Python performance is limited at some very fundamental level to what we see today. Personally I think this is fine. I use Python for everything day-to-day and offloading the intensive stuff to a C extension (a la Numpy) works just great. There are very few instances where I find myself wishing Python was faster.
[+] [-] njharman|12 years ago|reply
Also, extension modules, Pillow, Numpy, Scipy, Pandas greatly reduce need to make Python it self faster.
[+] [-] PythonicAlpha|12 years ago|reply
The few percent of running speed that you might loose, are neglect-able in most cases against the win in development speed.
In many applications you don't need the full CPU power, but often times you are hindered by e.g. the disk speed or other factors ... and than you don't lose anything when you are a little bit slower in some minor tasks.
[+] [-] nly|12 years ago|reply
citation needed.
[+] [-] zo1|12 years ago|reply
[+] [-] unknown|12 years ago|reply
[deleted]
[+] [-] rch|12 years ago|reply
This is pretty close to how I characterize my time, 'doing X with code', and Python yields great returns in these terms.
[+] [-] mrfusion|12 years ago|reply
I just switched my program from using a defaultdict to a regular dict.
I.e., from defaultdict(lambda:'NA') to regular dict using get(val, 'NA') for access.
And I got a something like a 100x speedup. It runs in two minutes instead of two hours. I had no idea a defaultdict would be so much slower.
Unless there's something funny going on in my program and it's unusual behavior.
[+] [-] zo1|12 years ago|reply
And they pretty much say that there is only one method that's been overridden from the normal dict class. And all it does is it executes the anonymous method (lamda) you supplied instead of returning the default value you pass in.
Also, this depends on your usage, and what you have in the lambda expression. I'm curious about your code, care to paste/link some of it for us?
[+] [-] cynwoody|12 years ago|reply
Where defaultdict shines is the case where you hit an entry a bunch of times, but you'd like to automate initializing an entry the first time it is hit.
The most common example is probably the frequency distribution, where the frequency needs to start out zero and be incremented each time a key is seen. Using int or lambda:0 as defaultdict's argument works fine for the simple case. If you want to tabulate more than just the count, e.g. a count and one or more sums, you can pass a constructor that produces an object with an update method, to which you pass whatever your key represents.
Simple microbenchmark: http://pastebin.com/vkacByzp
It appears that defaultdict saves time as well as code for the normal use case.
[+] [-] williamstein|12 years ago|reply
[+] [-] codegeek|12 years ago|reply
Also, it is worth asking if for majority of use cases of python for data/analysis, the ease and flexibility outweights the slowness.
[+] [-] YZF|12 years ago|reply
Virtual functions in C++ which allow some form of dynamic behaviour are slower than static function calls because they inherently involve another level of indirection. Static calls are known at compile time, they can be inlined by the compiler, they can be optimized in the context in which they're called. Now, nothing prevents the C++ run-time from trying to do the same thing at run-time but you can relatively easily see that it'll have to make some other compromised to do so. Nothing prevents a C++ program from generating C++ code at run-time, compiling it, and loading it into the current process as a .so. Now that's a pretty dynamic behaviour but there's again an obvious price. You can also write self modifying code. At any rate, static languages are capable of the same dynamic behaviour that dynamic languages are capable of but you often have to implement that behaviour yourself (or embed an interpreter...).
Fundamentally, a dynamic language can't make the kinds of assumptions a more static language can make, it can try and determine things at run-time (ala JIT) but those take time and still have to adapt to the dynamics of the language. The same line of code "a = b + c" in Python can mean something completely different every time it's executed so the run-time has to figure out what the types are an invoke the right code. Now the real problem is that if you take advantage of that then no one can actually tell what this code is doing and it is utterly unmaintainable.
To compound the problems facing dynamic languages is the fact that CPUs are optimized for executing "predictable" code. When your language is dynamic there are more dependencies in the instruction sequence and things like branch prediction may become more difficult. It also doesn't help that some of the dynamic languages we're discussing have poor locality in memory (that's an orthogonal issue though, you could give a dynamic language much better control over memory).
EDIT: One would think it should be possible to design a language which has both dynamic and static features where if you restrict yourself to the static portion runs just as fast as any other statically compiled language and still allows to switch to more dynamic concepts and pay the price when you do that.
[+] [-] Fede_V|12 years ago|reply
[+] [-] dmoney|12 years ago|reply
[+] [-] kghose|12 years ago|reply
Previously I had thought that doing a=110 creates an object 'a' that stores the value 110 (and when we do b=a, b simply points to a). I had no idea there is a third object in play.
[+] [-] sqrt17|12 years ago|reply
Which means that, if you execute "a = 110" a bunch of times, no new objects get created. If you execute "a= 110+300" a bunch of times, a whole bunch of new objects is created and destroyed.
By the way, this boxing/unboxing logic is also at work when you stick numbers into a standard HashMap, since it stores Integer objects and not the primitive type. However, in Java you can use GNU Trove's IntObjectHashMap or ObjectIntHashMap and save some memory, whereas Python's "everything is an object" rules this out (unless you're using PyPy and its tracing jit which figures out that a always contains a number).
[+] [-] stuki|12 years ago|reply
That's not to say that OO is necessarily the ideal language paradigm, but it has certainly been the most dominant in the era Python has existed.
[+] [-] taejo|12 years ago|reply
[+] [-] wyager|12 years ago|reply
[+] [-] rpearl|12 years ago|reply
[+] [-] zhemao|12 years ago|reply
[+] [-] Rusky|12 years ago|reply
[+] [-] ape4|12 years ago|reply
[+] [-] phorese|12 years ago|reply
The very first part says that this is only a writeup for people who are not intimately familiar with why "dynamically typed" might slow down Python.