top | item 16910446

Pythran: Crossing the Python Frontier [pdf]

122 points| serge-ss-paille | 8 years ago |computer.org | reply

68 comments

order
[+] eljost|8 years ago|reply
Interesting article but just skimming through it some things stand out immediately: 1.) The first snippet isn't even valid python code as floats don't have a shape attribute.

  s = 0. 
  n = s.shape
2.) The inline latex math isn't rendered properly.
[+] kristofferc|8 years ago|reply
The first snippet also doesn't balance the parenthesis

    s += 100. * x[i + 1] - x[i] ** 2.) ** 2. + (1 - x[i]) ** 2
[+] chestervonwinch|8 years ago|reply
It should be the shape of x (actually, the zero'th element of the shape), but this is also a tad odd because this would assume that x is a numpy array, which isn't introduced until after this 'naive pure python' code block (i.e., before numpy is even introduced in the text).
[+] rlayton2|8 years ago|reply
I believe the input should be a numpy array of floats which has a shape attribute
[+] targafarian|8 years ago|reply
Claims about Numba by the author seem a little unkind if not wrong to me.

Numba can handle vectorized (numpy behavior) directly in addition to explicit loops. The former is accelerated less in comparison to plain python calling numpy (since if you can use numpy operations directly, it's already really fast) but the numpy bits in Numba can also be automatically parallelized by Numba. Explicit loops in numba are accelerated hundreds of times over Python loops (and you can use e.g. prange to write parallel loops, too). Point is, the two paradigms can be mixed and matched at will within Numba.

It seems like every example is of cython, but then the author generalizes the conclusions to Numba as well. It would be much more "honest" to show side-by-side comparison of Numba, Cython, and Pythran, since these all have different syntaxes and are fairly different tools.

Another example is that you don't have to rewrite functions for different argument types in Numba, but you do in Cython (see "convolve_laplacian" example, which can work with a simple decorator as a numba function). There again, the impression is given that Numba suffers from the same issue as Cython (and as mentioned elsewhere in the comments here, it's possible that Cython has a way around this, but I don't know the details).

[+] AnimalMuppet|8 years ago|reply
Off topic, but this seems like as good a place as any to ask: It's my impression that numpy is really good. Is it as good as Fortran? That is, if I have a large, sparse, complex matrix, Fortran will have an efficient solver for it that will also be numerically stable, and will have four decades of use to find any weaknesses. Is numpy equivalent (except for the four decades part)? Is it close? Or does it just cover the basic cases well, and for the specializations you're on your own?
[+] zb|8 years ago|reply
NumPy is designed to work with SciPy, which is a wrapper for literally the same 4 decade old Fortran libraries (LAPACK &c.) that you're referring to here.
[+] geoalchimista|8 years ago|reply
> "As a matter of comparison, Cython does not support principles 1, 2, or 3 and has optional support for 4."

For 2 Type agnosticism, this can be emulated with a "fused type" in Cython. See this example: http://cython.readthedocs.io/en/latest/src/userguide/numpy_t...

But I think the major inconvenience with Cython vectorization is really not about `float32` and `float64`. You get `float64` NumPy array by default from floating-point calculations. The actual inconvenience is that the vectorized function cannot take a scalar input like the NumPy ones. To remain polymorphic, I usually have to perform an `is_scalar` check on the input in a Python wrapper before sending the input data to the Cython function.

[+] targafarian|8 years ago|reply
Note that the alternative in Numba is incredibly convenient, where you can trivially create a function where the body of the function operates on a scalar, but then using the @vectorize decorator which makes it into a numpy ufunc automatically. This generalizes the function to operate equally well on scalars or numpy arrays of any dimensionality (just like "built-in" numpy functions do).

Oh, and if you use set target='gpu', your function also works on GPUs, too. Or you can use target='parallel' to make it parallelize automatically across CPU cores.

[+] gnufx|8 years ago|reply
When Python was announced, the three main things I remember striking dynamic languages people (apart from using a bastardized offside rule) were weird scope rules, lack of proper GC, and that it appeared to be designed particularly to preclude efficient implementation. We've subsequently seen the huge amount of effort that's been devoted to different ways of working around the implementation issue.
[+] dilawar|8 years ago|reply
I wonder why the author did not compare performance of pypy? I guess pypy is jit compiler.
[+] dec0dedab0de|8 years ago|reply
I was going to say that most scientific python libraries use numpy, but a quick google shows that pypy supports numpy now.
[+] joshsyn|8 years ago|reply
Isn't this solved by julia? I think scientific community should use a more functional language rather than language like python tbh
[+] poster123|8 years ago|reply
Or by modern Fortran, which has had array operations since the 1990 standard? There are functional elements such as PURE functions and the FORALL construct.
[+] knlji|8 years ago|reply
Maybe. It's currently just a safer bet to learn and use Python. Easier to get a job after you fail getting your next grant. I have so far seen zero Julia job ads. Hell, I see more e.g. Haskell and Fortran job ads than Julia.
[+] montalbano|8 years ago|reply
Though I'll still use Python for non-scientific programming, I've switched to Julia for all my scientific programming needs in my day job.
[+] evrydayhustling|8 years ago|reply
In research programming, you often spend as much or more time in data acquisition and munging than implementing core algorithms. Plus, more than in production code, the requirements change as you explore different applications and approach. And, because it's not production code, you have more opportunity to explore outputs at different stages to review function. It's effectively continual prototyping.

All of these things play to python's main strengths: huge community with connectors to every API and format, plus ability to conveniently integrate code at several levels of complexity & maturity as you prototype.

[+] hprotagonist|8 years ago|reply
just as soon as someone who knows C/C++ ports numpy and scipy and pandas. and gensim and nltk and sounddevice. and tensorflow and scikit-learn and keras....
[+] Derbasti|8 years ago|reply
I keep trying out Julia every year or so. And it has come a long way. Gone are the days of crashes, missing documentation, and terrible error messages.

And yet, for my particular area (audio signal processing), Julia is just objectively worse than Python in expressivity, library support, and even speed.

But I'll keep trying. Maybe next year.

[+] ocschwar|8 years ago|reply
Julia is FOrtran-indexed, and thus anathema to my religion.
[+] rfeather|8 years ago|reply
Could you elaborate more on the advantages of a functional language?
[+] moolcool|8 years ago|reply
Python has good support for a lot of functional concepts
[+] xapata|8 years ago|reply
You want multi-line lambdas and macros? Anything else?
[+] KronenR|8 years ago|reply
Give me an ecosystem like Python has and I won't switch anyways because for when that happens I will have more experience with Python, which is much more important.
[+] HerrMonnezza|8 years ago|reply
Does anyone know how this compares to existing Python-to-C++ transpilers like Cython or Shedskin?
[+] Arkanosis|8 years ago|reply
Cython is a bit different from CPython / Pythran / Shedskin in that you need to learn the Cython language, which is a Python-ish programming language, but not Python.

Shedskin and Pythran look somewhat similar to me (disclaimer: I've contributed quite a bit to Shedskin but have never used Pythran so far), except in Shedskin you don't even need annotations like with Pythran (the downside being the finer control you have of the native types used is through the transpiler options). Also, Shedskin development is not much active these days — to say the least — and there's zero support for Python 3, while Pythran is under fairly active development has beta support for Python 3.

If you're interested in Python / native implementations, you might be interested by Nuitka as well: http://nuitka.net/