Python is fast enough, until it isn't and then there are no simple alternatives.
If your problem is numerical in nature, you can call popular C modules (numpy, etc) or write your own.
If your functions and data are pickleable, you can use multiprocessing but run into Amdahl's Law.
Maybe you try Celery / Gearman introducing IO bottlenecks transferring data to workers.
Otherwise you might end up with PyPy (poor CPython extension module support) and still restricted by the GIL. Or you'll try Cython, a bastard of C and Python.
Python has been my primary language the past few years and it's great for exploratory coding, prototypes, or smaller projects. However it's starting to lose some of the charm. Julia is filling in as a great substitute for scientific coding in a single language stack, and Go / Rust / Haskell for the other stuff. I've switched back to the static language camp after working in a multi-MLOC Python codebase.
> Julia is filling in as a great substitute for scientific coding in a single language stack, and Go / Rust / Haskell for the other stuff.
I've been wondering about why so many python devs have migrated to using Go recently instead of Julia, given that Julia is a lot closer to python and has performed as good as, if not better than, Go in some benchmarks [1]. Granted I've really only toyed with Julia and Go a few times as I've never really needed the performance much myself, but I'm curious about your preference of Go/Rust over Julia for "the other stuff".
What would you say makes Julia less suitable (or Go more suitable) for nonscientific applications? Is it just the community/support aspect? Cause that seems like an easy tide to overturn by simply raising more awareness about it (we see Go/Rust/Haskell blog posts on the front page of HN every week, but not too many Julia posts).
Just curious cause I'm not nearly experienced enough with any of these young languages yet to know any better, and have only recently started to consider taking at least one of them up more seriously.
If you think of CPU cycles as currency, "fast enough" shows its true colors: if you're profligate in your spending of CPU cycles, you simply don't have any left when you really so need them.
Living CPU paycheck to paycheck and on occasion taking performance payday loans (breaking out to C) is not an efficient way to manage resources.
Besides Julia, I think another alternate language to Python for scientific computation would be Scala. Breeze (from ScalaNLP project) is an effort to bring Numpy and Matlab syntax to Scala: https://github.com/scalanlp/breeze/wiki/Breeze-Linear-Algebr...
Do you have any specific publicity available code which you claim is too slow in python? Otherwise, I don't believe you.
For example, hg is now faster than git. Git is written by great C hackers (including Linus no less), and yet hg is faster than git. See the facebook benchmarks for evidence.
Python has static checking of interfaces, and types if you want. It also has IDEs which can check a lot of things for you. It turns out that dynamic typed languages can be checked for quite a lot of things.
Check out using mmap based datastructures to do shared memory between workers.
Using a Numpy-based program as an example of how Python can be fast is a bit strange.
It shows that Python can be fast enough if you leave the heavy computational parts to libraries written in other languages. Which is interesting, but doesn't say much about the speed of Python itself.
Yet, numpy exists for python, so why can't you use it in a comparison? Why can't you use libraries for languages? The python approach is all about gluing bits together, not staying within a pure walled garden.
Also, python is not an implementation of python.
These benchmarks are always funny, because real systems use different components, yet the benchmarks stick to some fake, non-real-world way of measuring.
Oh, the garbage collection in java pauses for multiple seconds soemtimes... but it's not slow because we ignore that in our benchmarks. Oh, it's not fast the first time, because the jit hasn't warmed up? Let's ignore that in our benchmarks too. Um... yeah. Good one.
This benchmark is also flawed to since people would probably use numexpr in the real world. Which is much faster than plain numpy. So python would be even faster than they say.
Mercurial(hg) is now faster than git. Git is written by great C hackers (including Linus no less), and yet hg is faster than git. See the facebook benchmarks for evidence.
Using the right tool for the job can mean using multiple languages together for where they are best. Want clarity and performance? Then C/asm + python is an ok combination.
Well, this particular example shows instead that the overhead of calling libraries written in other languages can be so large that a pure python solution can be faster. The current top answer to the question shows a pure python version that is about 20 times faster, commenters explain that the arrays are so small that the overhead in calling numpy outweighs any avantage numpy's speed gives.
Python itself includes all the options; PyPy, Cython, Psyco, Numpy and other C extestions, Jython and more. It's all Python. That is one one of the strengths of Python.
Would you exclude use of stdlib parts that are written in C? Would you say Javascript can't run serverside cause Node.js is not part of your imagined "core language"? Or it doesn't say much about the speed of C when you use a better/more optimized compiler or compiler flags?
It isn't strange, it's standard practice. What would be strange is to force Python to hold one hand behind its back and be used in a totally unrealistic way that doesn't reflect normal practice. And by strange I mean basically dishonest about the performance available when using Python.
Anecdotical datapoint: For a great many years I used to consider python a very slow language. Then I switched to Ruby and realized how slow a language can really be and yet still be practical for many use-cases.
Obviously, this is a silly benchmark and we should stop giving it any credit.
However, even "real world" anecdotes in this area can be a minefield.
Take, for example, an existing Python application that's slow which requires a rewrite to fix fundamental architectural changes.
Because you feel you don't need necessarily need the flexibility of Python the second time around (as you've moved out of the experimental or exploratory phase of development), you decide to rewrite it in, say, Go, or D or $whatever.
The finished result turns out to be 100X faster—which is great!—but the danger is always there that you internalise or condense that as "lamby rewrote Python system X in Go and it was 100X faster!"
I spend a lot of time debating program speed (mostly C vs MATLAB), but the problem is that the programming and compile time usually makes more of a difference than people consider.
If my C is 1000x faster and saves me 60 seconds every time I run the program, but takes an extra 2 days to write initially, and the program is seeing lots of edits meaning that on average I have to wait 2 minutes for it to compile then I am MUCH better off with the slower MATLAB until I am running the same thing a few thousand times.
Plus there is the fact that I can look at HN while a slightly slower program is running, so I win both ways.
I think a lot of that delta is going to prove to have been an accident of history, though. In the past 10-15 years, we've had a lot of "dynamic" languages, which have hit a major speed limit (see another comment I made in this discussion about how languages really do seem to have implementation speed limits). Using a "dynamic" language from the 1990s has been easier than using gussied-up static 1970s tech for a quick prototype, but what if the real difference has more to do with the fact that the 1990s tech simply has more experience behind the design, rather than an inherent ease-of-use advantage?
It's not hard to imagine a world where you instead use Haskell, prototyping your code in GHCi or even just writing it in Haskell directly, pay a minimal speed penalty for development since you're not being forced to use a klunky type system, and get compiled speeds or even GPGPU execution straight out of the box. (And before anyone freaks out about Haskell, using it for numeric computations requires pretty much zero knowledge about anything exotic... it's pretty straightforward.) It's not out of the question that using Haskell in this way would prototype even faster than a dynamic language, because when it gives you a type error at compile time rather than at runtime, or worse, running a nonsense computation that you only discover afterwards was nonsense, you could save a lot of time.
I don't think there has to be an inherent penalty to develop with native-speed tech... I think it's just how history went.
Makes sense if you are the only person running your programs (and you are allowed to ignore things like hardware and power costs).
Also, 2 minutes per change to compile the object files affected and link the executable seems a bit excessive considering the entire Linux kernel can generally be built from scratch in less time than that (assuming a modern system).
You didn't read the thread. The OPs code used very small arrays and using numpy was slowing the code down by an order of magnitude. The pure python solution is 17x faster.
Why would you use Numpy for arrays that small? Oh, looks like someone actually just wrote it in CPython, no Numpy, and it clocked in at 0.283s. Which is fine. It's Python.
This thread reminds me of the scene in RoboCop where Peter Weller gets shot to pieces. Peter Weller is Python and the criminals are the other languages.
Judging by the top submission being also written in python, I think this just shows how unoptimized OP's original code was rather than how slow the language is.
Not that python is fast, it isn't. And using numpy seems a bit disingenuous anyways "Oh my python program is faster because I use a library that's 95% C"
The same author previously posted this code as a question on Stack Overflow: http://stackoverflow.com/questions/23295642/ (but we didn't speed it up nearly as much as the Code Golf champions).
This sort of thing comes up a lot: people write mathematical code which is gratuitously inefficient, very often simply because they use a lot of loops, repeated computations, and improper data structures. So pretty much the same as any other language, plus the extra subtlety of knowing how and why to use NumPy (as it turned out, this was not a good time for it, though that was not obvious).
You can make this far faster by changing the data representation. You can represent S as a bit string so that if the i'th bit is 0 then S[i] = 1 and if the i'th bit is 1 then S[i] = -1. Lets call that bit string A. You can represent F as two bit strings B,C. If the i'th bit in B is 0 then F[i] = 0. If the i'th bit of B is 1 then if the i'th bit of C is 0 then F[i] = 1 else F[i] = -1. Now the whole thing can be expressed as parity((A & B) ^ C). The parity of a bit string can be computed efficiently with bit twiddling as well. Now the entire computation is in registers, no arrays required. The random generation is also much simpler, since we only need to generate random bit strings B,C and this is already directly what random generators give us. I wouldn't be surprised if this is 1000x faster than his Python.
It's really fast to develop in, and with NumPy/Pandas/Scipy it runs numerical models fairly fast too. You do have to spend time getting to know `cProfile` and `pstats`; saved over 80% on runtime of something the other day.
I no longer accept the idea that languages don't have speeds. Languages place an upper bound on realistic speed. If this isn't true in theory, it certainly is true in practice. Python will forever be slower than C. If nothing else, any hypothetical Python implementation that blows the socks off of PyPy must still be executing code to verify that the fast paths are still valid and that nobody has added an unexpected method override to a particular object or something, which is an example of something in Python that makes it fundamentally slower than a language that does not permit that sort of thing.
The "misconception" may be the casual assumption that the runtimes we have today are necessarily the optimal runtimes, which is not generally true. But after the past 5-10 years, in which enormous amounts of effort have been poured into salvaging our "dynamic" language's (Python, JS, etc.) run speeds, which has pretty much resulted in them flatlining around ~5 times slower than C with what strikes me as little realistic prospect of getting much lower than that, it's really getting time to admit that language design decisions do in fact impact the ultimate speed a language will be capable of running at. (For an example in the opposite direction, see LuaJIT, a "dynamic" language that due to careful design can often run at near-C.)
(BTW, before someone jumps in, no, current Javascript VMs do NOT run at speeds comparable to C. This is a common misconception. On trivial code that manipulates numbers only you can get a particular benchmark to run at C speeds, but no current JS VM runs at C speeds in general, nor really comes even close. That's why we need asm.js... if JS VMs were already at C speeds you wouldn't be able to get such speed improvements from asm.js.)
Go has an extremely fast compiler. If they ever add modules to C/C++ they should get a big bump in speed too. A lot can be done to fix the slow compile cycle of some languages.
Summary: Question asker wrote a program in Python using numpy (A Python library that calls C code) which could've been more performant if written in pure Python (something to do with array sizes being used) and Python in general is slower than C/C++/Fortran/Rust. Anything else new?
Yet another attempt at a comparison scuttled by using randomness.
Different things use different types of randomness. Some are fast. Some are slow. If your comparison is not using the same type of randomness, that comparison is comparatively useless.
[+] [-] wting|12 years ago|reply
If your problem is numerical in nature, you can call popular C modules (numpy, etc) or write your own.
If your functions and data are pickleable, you can use multiprocessing but run into Amdahl's Law.
Maybe you try Celery / Gearman introducing IO bottlenecks transferring data to workers.
Otherwise you might end up with PyPy (poor CPython extension module support) and still restricted by the GIL. Or you'll try Cython, a bastard of C and Python.
Python has been my primary language the past few years and it's great for exploratory coding, prototypes, or smaller projects. However it's starting to lose some of the charm. Julia is filling in as a great substitute for scientific coding in a single language stack, and Go / Rust / Haskell for the other stuff. I've switched back to the static language camp after working in a multi-MLOC Python codebase.
[+] [-] oblique63|12 years ago|reply
I've been wondering about why so many python devs have migrated to using Go recently instead of Julia, given that Julia is a lot closer to python and has performed as good as, if not better than, Go in some benchmarks [1]. Granted I've really only toyed with Julia and Go a few times as I've never really needed the performance much myself, but I'm curious about your preference of Go/Rust over Julia for "the other stuff".
What would you say makes Julia less suitable (or Go more suitable) for nonscientific applications? Is it just the community/support aspect? Cause that seems like an easy tide to overturn by simply raising more awareness about it (we see Go/Rust/Haskell blog posts on the front page of HN every week, but not too many Julia posts).
Just curious cause I'm not nearly experienced enough with any of these young languages yet to know any better, and have only recently started to consider taking at least one of them up more seriously.
[1] http://julialang.org/benchmarks/
[+] [-] teacup50|12 years ago|reply
Living CPU paycheck to paycheck and on occasion taking performance payday loans (breaking out to C) is not an efficient way to manage resources.
[+] [-] vkhuc|12 years ago|reply
The learning curve for Scala may be steep though.
[+] [-] JasonFruit|12 years ago|reply
[+] [-] illumen|12 years ago|reply
For example, hg is now faster than git. Git is written by great C hackers (including Linus no less), and yet hg is faster than git. See the facebook benchmarks for evidence.
Python has static checking of interfaces, and types if you want. It also has IDEs which can check a lot of things for you. It turns out that dynamic typed languages can be checked for quite a lot of things.
Check out using mmap based datastructures to do shared memory between workers.
[+] [-] Wilya|12 years ago|reply
It shows that Python can be fast enough if you leave the heavy computational parts to libraries written in other languages. Which is interesting, but doesn't say much about the speed of Python itself.
[+] [-] illumen|12 years ago|reply
Also, python is not an implementation of python.
These benchmarks are always funny, because real systems use different components, yet the benchmarks stick to some fake, non-real-world way of measuring.
Oh, the garbage collection in java pauses for multiple seconds soemtimes... but it's not slow because we ignore that in our benchmarks. Oh, it's not fast the first time, because the jit hasn't warmed up? Let's ignore that in our benchmarks too. Um... yeah. Good one.
This benchmark is also flawed to since people would probably use numexpr in the real world. Which is much faster than plain numpy. So python would be even faster than they say.
Mercurial(hg) is now faster than git. Git is written by great C hackers (including Linus no less), and yet hg is faster than git. See the facebook benchmarks for evidence.
Using the right tool for the job can mean using multiple languages together for where they are best. Want clarity and performance? Then C/asm + python is an ok combination.
[+] [-] omaranto|12 years ago|reply
http://codegolf.stackexchange.com/a/26337/3074
[+] [-] njharman|12 years ago|reply
Would you exclude use of stdlib parts that are written in C? Would you say Javascript can't run serverside cause Node.js is not part of your imagined "core language"? Or it doesn't say much about the speed of C when you use a better/more optimized compiler or compiler flags?
[+] [-] pekk|12 years ago|reply
[+] [-] wffurr|12 years ago|reply
[deleted]
[+] [-] moe|12 years ago|reply
[+] [-] Freaky|12 years ago|reply
[+] [-] stefantalpalaru|12 years ago|reply
[+] [-] lamby|12 years ago|reply
However, even "real world" anecdotes in this area can be a minefield.
Take, for example, an existing Python application that's slow which requires a rewrite to fix fundamental architectural changes.
Because you feel you don't need necessarily need the flexibility of Python the second time around (as you've moved out of the experimental or exploratory phase of development), you decide to rewrite it in, say, Go, or D or $whatever.
The finished result turns out to be 100X faster—which is great!—but the danger is always there that you internalise or condense that as "lamby rewrote Python system X in Go and it was 100X faster!"
[+] [-] chrisBob|12 years ago|reply
If my C is 1000x faster and saves me 60 seconds every time I run the program, but takes an extra 2 days to write initially, and the program is seeing lots of edits meaning that on average I have to wait 2 minutes for it to compile then I am MUCH better off with the slower MATLAB until I am running the same thing a few thousand times.
Plus there is the fact that I can look at HN while a slightly slower program is running, so I win both ways.
[+] [-] jerf|12 years ago|reply
It's not hard to imagine a world where you instead use Haskell, prototyping your code in GHCi or even just writing it in Haskell directly, pay a minimal speed penalty for development since you're not being forced to use a klunky type system, and get compiled speeds or even GPGPU execution straight out of the box. (And before anyone freaks out about Haskell, using it for numeric computations requires pretty much zero knowledge about anything exotic... it's pretty straightforward.) It's not out of the question that using Haskell in this way would prototype even faster than a dynamic language, because when it gives you a type error at compile time rather than at runtime, or worse, running a nonsense computation that you only discover afterwards was nonsense, you could save a lot of time.
I don't think there has to be an inherent penalty to develop with native-speed tech... I think it's just how history went.
[+] [-] damien|12 years ago|reply
Also, 2 minutes per change to compile the object files affected and link the executable seems a bit excessive considering the entire Linux kernel can generally be built from scratch in less time than that (assuming a modern system).
[+] [-] ahoge|12 years ago|reply
Better: http://benchmarksgame.alioth.debian.org/
[+] [-] wffurr|12 years ago|reply
[+] [-] igouy|12 years ago|reply
[+] [-] unknown|12 years ago|reply
[deleted]
[+] [-] unknown|12 years ago|reply
[deleted]
[+] [-] fmdud|12 years ago|reply
Why would you use Numpy for arrays that small? Oh, looks like someone actually just wrote it in CPython, no Numpy, and it clocked in at 0.283s. Which is fine. It's Python.
This thread reminds me of the scene in RoboCop where Peter Weller gets shot to pieces. Peter Weller is Python and the criminals are the other languages.
[+] [-] Igglyboo|12 years ago|reply
Not that python is fast, it isn't. And using numpy seems a bit disingenuous anyways "Oh my python program is faster because I use a library that's 95% C"
[+] [-] unknown|12 years ago|reply
[deleted]
[+] [-] jzwinck|12 years ago|reply
If you enjoyed this Python optimization, you may also enjoy: http://stackoverflow.com/questions/17529342/
This sort of thing comes up a lot: people write mathematical code which is gratuitously inefficient, very often simply because they use a lot of loops, repeated computations, and improper data structures. So pretty much the same as any other language, plus the extra subtlety of knowing how and why to use NumPy (as it turned out, this was not a good time for it, though that was not obvious).
[+] [-] jules|12 years ago|reply
[+] [-] alexchamberlain|12 years ago|reply
[+] [-] pjmlp|12 years ago|reply
"How fast is the code produced by your compiler."
I keep seeing this misconception about languages vs implementations.
EDIT: Clarified what my original remark meant.
[+] [-] jerf|12 years ago|reply
The "misconception" may be the casual assumption that the runtimes we have today are necessarily the optimal runtimes, which is not generally true. But after the past 5-10 years, in which enormous amounts of effort have been poured into salvaging our "dynamic" language's (Python, JS, etc.) run speeds, which has pretty much resulted in them flatlining around ~5 times slower than C with what strikes me as little realistic prospect of getting much lower than that, it's really getting time to admit that language design decisions do in fact impact the ultimate speed a language will be capable of running at. (For an example in the opposite direction, see LuaJIT, a "dynamic" language that due to careful design can often run at near-C.)
(BTW, before someone jumps in, no, current Javascript VMs do NOT run at speeds comparable to C. This is a common misconception. On trivial code that manipulates numbers only you can get a particular benchmark to run at C speeds, but no current JS VM runs at C speeds in general, nor really comes even close. That's why we need asm.js... if JS VMs were already at C speeds you wouldn't be able to get such speed improvements from asm.js.)
[+] [-] melling|12 years ago|reply
[+] [-] omaranto|12 years ago|reply
[+] [-] donniezazen|12 years ago|reply
[+] [-] Roboprog|12 years ago|reply
http://www.catb.org/jargon/html/L/languages-of-choice.html
and
http://www.catb.org/jargon/html/T/TMTOWTDI.html
[+] [-] taude|12 years ago|reply
[+] [-] WoodenChair|12 years ago|reply
[+] [-] igouy|12 years ago|reply
[+] [-] malkia|12 years ago|reply
[+] [-] chrismorgan|12 years ago|reply
Different things use different types of randomness. Some are fast. Some are slow. If your comparison is not using the same type of randomness, that comparison is comparatively useless.