top | item 13355034

From Python to Numpy

348 points| haraball | 9 years ago |labri.fr | reply

48 comments

order
[+] grej|9 years ago|reply
As someone who has used numpy for many years and written a great deal of production code using it, I was surprised when I read through this and saw some numpy tricks that I didn't know regarding the speeds of various operations! This is really a fantastic reference that provides a deeper level of understanding of what numpy does under the hood.

One thing I will highlight that the author just touched on briefly, is that numpy combined with numba is really a phenomenal combination for dealing with very computationally intensive problems.

The folks at Continuum Analytics have really done a fantastic job building numba (numba.pydata.org), which JIT compiles a subset of python functions using LLVM, and is designed to work seamlessly with numpy arrays. Numba makes it much easier to speed up performance bottlenecks and allows you to easily create numpy ufuncs which can take advantage of array broadcasting.

[+] darkseas|9 years ago|reply
Can I ask how intensively you have used Numba and over what period? I'm interested in how Numba has progressed over the last few years, with a view to using it over Cython.

My team and I looked at Numba a year ago or so for optimisation of a fairly large calculation, and found that the speed-ups were impressive where they worked, but were not consistent or predictable.

We used Cython for large parts, and while there was boilerplate and incantations, the gains were achievable, incremental and certain. The annotation tools were also quite helpful for identifying bottlenecks where Cython code could be effective.

Incidentally, once we decided that Cython was our go-to tools, we often wrote simple looping code rather than vectorised code because it was simpler to transition to Cython, alá Julia.

[+] marmaduke|9 years ago|reply
Just to jump on the numba train, I've generally found it to reliably obtain C like performance from C-like Python code. This property also holds when you use Python as a preprocessor language for generating computational kernels, which provides a lot of flexibility not evident in the documentation.

It also has simple-to-use openmp-like multicore parallelization, limited class support, AOT compilation and CUDA & AMD HSAIL support.

[+] vegabook|9 years ago|reply
Travis Oliphant, Numpy creator, is CEO of Continuum Analytics.
[+] danso|9 years ago|reply
Immediately recognized the domain name. Months ago I was doing yet another search on how to do geospatial plotting with Matplotlib, the kind that mostly works-out-of-the-box in R/ggplot2, but, because of some latent fragmentation from Py2v3, was not well-documented anywhere in Python/matplotlib. And while I've come to really like and respect Matplotlib, the documented examples stray far from what they should for purposes of API illustration, and so learning it has been a test in patience.

Anyway, Mr. Rougier's Matplotlib was both informative, concise, and beautiful. Actually, I think my appreciation for matplotlib came from reading his guide: https://www.labri.fr/perso/nrougier/teaching/matplotlib/

[+] mmmBacon|9 years ago|reply
I'd be very curious to know if there is any impact to choosing Numpy C ordered arrays or Fortran ordered arrays. As a long time Matlab user (since 1993) who moved to Python 3 years ago, I have always defaulted to Fortran order because it was what I was used to and seemed more intuitive. I did play with C ordered arrays but didn't find an advantage in my limited investigation.
[+] travisoliphant|9 years ago|reply
There may still be a few routines that expect C-ordered arrays and so require a copy be made when given a Fortran-ordered array --- especially as you extend to one of libraries that use NumPy. For the most, part, however, Fortran-ordered arrays should work well. It all comes down to the expectation of the routine writer.
[+] zwieback|9 years ago|reply
I think it depends on whether an algorithm is forced to traverse an array in the cache-efficient direction or not. Oftentimes you can't choose whether to make your outer loop rows or columns so the performance could go either way.
[+] syntaxing|9 years ago|reply
Does anyone have a recommendation for something similar to this but for Python itself? I have been trying to find something that is not necessarily an intro or crash course book but a book with tips, great explanations, and neat examples (which this e-book(?)/site has).

I see that the author has responded to a couple comments here. Thank you for your great work! It's always great to have a nice reference material with concise examples. I think this will be helpful to everyone(beginners and advanced python users alike)!

[+] haldora|9 years ago|reply
I would recommend Julien Danjou's "The Hacker's Guide to Python". He charges $29 for the PDF, but provides updates every year or so. I think it's a great book for getting more depth out of Python. :)

Topics include: modules/libraries, documentation, distribution, virtual environments, unit testing, methods/decorators, functional programming, optimization, scaling, RDBMS, and more.

https://thehackerguidetopython.com/

[+] bcbrown|9 years ago|reply
I'm skimming through Effective Python, and so far I think it's a pretty good "best practices" guide for intermediate Python developers.
[+] skadamat|9 years ago|reply
Hey I work at Dataquest (dataquest.io) and we have a lot of intermediate & advanced Python content. It's all done through an in-browser coding environment which lets us do answer checking and so on.
[+] jajool|9 years ago|reply
this book is amazing! specially the authors sense of humor makes reading it fun.

> For example, can you tell what the two functions below are doing? Probably you can tell for the first one, but unlikely for the second (or your name is Jaime Fernández del Río and you don't need to read this book).

[+] zellyn|9 years ago|reply
Anyone (author) know what was used to generate the cover image of cubes and shadows?

Edit: it's sketchup - there's a .skp file in the data/ subdirectory of the github repo for the book.

[+] hayd|9 years ago|reply
Is there an epub/mobi version?
[+] BanzaiTokyo|9 years ago|reply
Does the book exist in PDF?
[+] Nicolas-Rougier|9 years ago|reply
Not yet but I'm working on it (meanwhile you can try a rst2latex.py on the sources) for a very rough draft.
[+] d0mine|9 years ago|reply
I've failed to build pdf from book.tex produced by rst2latex.py (the issues are likely fixable if you work with latex).

I've converted the rst files to e-book using rst2html.py + calibre instead.

[+] guitarbill|9 years ago|reply
> be warned that I'm a bit picky about typography & design: Edward Tufte is my hero

And it shows, the theme is beautiful. Also some of the best ASCII diagrams I've seen. Worth a look at the source, even if you don't care about Python.

[+] keldaris|9 years ago|reply
Wouldn't normally criticize website design, but since this came up... yes, the fonts are pretty and all, but on my humble 24" monitor the site uses barely half of my horizontal space and looks awkward (the table of contents especially). "Mobile-first"?
[+] wott|9 years ago|reply
Fonts are barely readable: too thin, too white on my browser/screen.
[+] gerfficiency|9 years ago|reply
I really appreciated the problem vectorization chapter. New approaches require new thinking and this is often forgotten when teaching new concepts.