Parallel tasks in Python: concurrent.futures

[+] pletnes|8 years ago|reply

One common misconception (or should I say, overgeneralization) is repeated in the article: threads are always unsuited to CPU intensive work.

For instance, most numpy operations release the GIL, meaning that you can perform heavy computation on multiple threads simultaneously. Certain other C extensions do the same, including some bits of the standard library. The usual caveats apply about threading bugs, of course.

Another detail is that numpy linked to e.g Intel MKL will multithread some operations by default. Running multiple threads-in-threads is likely to cause slowdown.

[+] uranusjr|8 years ago|reply

I only see the article mentioning CPU (and GIL) once, but in any case, the generalisation is correct for any pure Python code. You can only release the GIL in extension code (for CPython), and in that point you’re not really dealing with Python threads (although you may use Python’s wrapper API for threading), but its underlying implementation (e.g. pthreads) instead. The framing of the statement is very important, and I wouldn’t call it overgeneralising in the context of this particular article.

[+] jzwinck|8 years ago|reply

Only np.dot() has intrinsic multithreading. No other functions do. Bizzarely np.dot() is the fastest way to do things other than dot product (like copy or multiply) in some cases.

[+] orf|8 years ago|reply

> posted in Jan. 2017

Now we have asyncio and awesome libraries like aiohttp[1] you can get a much, much higher throughput than you'd ever achieve with threads with less code.

1. http://aiohttp.readthedocs.io/

[+] miracle2k|8 years ago|reply

I found this article extremely persuasive, and it matches my own experiences with asyncio: The performance gain might be there if most of what you do is waiting for a network response, but even a small amount of data processing will make your program CPU bound pretty quickly.

http://techspot.zzzeek.org/2015/02/15/asynchronous-python-an...

[+] sametmax|8 years ago|reply

Plus asyncio has futures too, and with run_in_executor(), you can await something in a thread/multiprocessing pool from inside the event loop transparently.

[+] trentnelson|8 years ago|reply

You can get the best of both worlds with PyParallel! Async I/O and multiple threads. (Experimental project, not intended for production use, so I say this some what facetiously.)

http://pyparallel.org/

[+] rcthompson|8 years ago|reply

Will this finally let me write a parallel Python script that doesn't explode when I press control+C?

[+] jzwinck|8 years ago|reply

That's always been easy enough, just a little hidden:

    import signal
    signal.signal(signal.SIGINT, signal.SIG_DFL)

Now it will just die on Ctrl-C. For text filter programs it's a good idea to do the same for SIGPIPE too.

[+] metalliqaz|8 years ago|reply

Depends. What do you want to happen when you press ctrl+C?

[+] Rotareti|8 years ago|reply

Isn't this what the `with` keyword was made for:

    with ThreadPool() as pool:
        fut = pool.submit(foo)
        print(fut.result())

My idea is that ctrl+c would now cancel child operations cleanly?

[+] simonw|8 years ago|reply

I used the Python 2 backport of concurrent.futures for a project recently (parallelizing calls to an external API) and it worked fantastically well. It's a really nice model for doing concurrent outbound I/O in a bunch of threads.

[+] ggm|8 years ago|reply

I tried using threads on multiple pipe reading, to centralise a logfile sorting problem (each discrete logfile is a gz which itself is only partially in order, and then between files a merge-sort has to be performed) It was enjoyable to try to fix, but ultimately I found the solution only marginally better than explicit processes feeding a single reader doing round-robin. I think the lesson I learned is that if the problem integrates back into a single context there isn't much you can do to avoid that bottleneck once all the other parallelism opportunities have been overcome.

[+] rflrob|8 years ago|reply

What's the advantage of using a ProcessPoolExecutor over just using multiprocessing? Is it that there's a single interface that you can use for both threads and processes?

[+] icegreentea2|8 years ago|reply

I don't really think there's an advantage. Just like how multiprocessing tries to mirror the threading interface, ProcessPoolExecutor just mirrors the threaded implementation of the new futures based concurrency interface.

I think futures are nicer for certain types of interactions. For example, futures 'return' actual values, so its nice for dispatching a task that you'll get a result from back. Futures also raise exceptions (when you try to inspect their results, if an exception occurred in the task). This might make for cleaner error handling code.

[+] rcthompson|8 years ago|reply

multiprocessing also has a single interface for both threads and processes, so it's not that.

[+] tyu100|8 years ago|reply

I'm using concurrent.futures in production and its use of the multiprocessing module caused the Python grpc library to break in a really strange and hard-to-debug way:

https://github.com/grpc/grpc/issues/13873

I suspect it's not the only Python library that will see issues if you are running it in the Future context.

[+] mixmastamyk|8 years ago|reply

Cool, just wrote my first code with this module a week ago. A client needed to run background tasks under Flask without the ops complexity or dev time needed to set up a job queue. https://stackoverflow.com/a/39008301/450917

[+] Rotareti|8 years ago|reply

I wrote a similar interface to run asyncio compatible ProcessPool/ThreadPool executors, a couple of days ago:

https://github.com/feluxe/aioexec

[+] amelius|8 years ago|reply

Sad to see that Python still suffers from the Global Interpreter Lock (GIL), and that the only way out is still to use multiple processes (which causes problems of its own, e.g. sharing of large data structures becomes expensive).

[+] dullgiulio|8 years ago|reply

Only for computationally expensive operations done in interpreted Python.

C extensions, IO operations etc. always release the lock. In practice GIL is a problem only when it is profiled to be a problem.

Python is used a lot in the data analysis world and nobody cares about the lock, because a fraction of the CPU time is spent within the lock.

[+] Dowwie|8 years ago|reply

I found a concurrent.futures.ThreadPoolExecutor useful for database seeding, where I invoke a whole lot of sql alchemy core inserts

[+] Jsharm|8 years ago|reply

Does anyone have a recommendation for what to use for a cache shared between processes? Would hdf5 work?

[+] cranklin|8 years ago|reply

If you want a file-based cache, yes.

[+] solotronics|8 years ago|reply

so how does this compare to something like Deco? https://github.com/alex-sherman/deco. I guess since this uses a single GIL its good for IO limited things?

[+] icegreentea2|8 years ago|reply

As written, the code in the blogpost is good for IO limited things. But as it notes, if you replace 'ThreadPoolExecutor' with 'ProcessPoolExecutor' then you get actual multiprocessing, and you may be able to get speedup on compute bound tasks.

The linked repo looks like some nice wrappers/decorators around the 'old' multiprocessing library to make it really easy to parallelize a bunch of function calls within a blocking function.

[+] delaaxe|8 years ago|reply

The last for loop of the last code example doesn't need to be under the with statement.

[+] dokument|8 years ago|reply

Would this allow separate GC'ng for each task?

[+] loganekz|8 years ago|reply

Only ProcessPoolExecutor[1]. If you use a thread pool or async io it will be single python process/GIL.

[1] -https://docs.python.org/3/library/concurrent.futures.html#pr...

46 comments