Mpire: A Python package for easier and faster multiprocessing

[+] singhrac|2 years ago|reply

I've spent a lot of time writing and debugging multiprocessing code, so a few thoughts, besides the general idea that this looks good and I'm excited to try it:

- automatic restarting of workers after N task is very nice, I have had to hack that into places before because of (unresolveable) memory leaks in application code

- is there a way to attach a debugger to one of the workers? That would be really useful, though I appreciate the automatic reporting of the failing args (also hack that in all the time)

- often, the reason a whole set of jobs is not making any progress is because of thundering herd on reading files (god forbid over NFS). It would be lovely to detect that using lsof or something similar

- it would also be extremely convenient to have an option that handles a Python MemoryError and scales down the parallelism in that case; this is quite difficult but would help a lot since I often have to run a "test job" to see how much parallelism I can actually use

- I didn't see the library use threadpoolctl anywhere; would it be possible to make that part of the interface so we can limit thread parallelism from OpenMP/BLAS/MKL when multiprocessing? This also often causes core thrashing

Sorry for all the asks, and feel free to push back to keep the interface clean. I will give the library a try regardless.

[+] milliams|2 years ago|reply

Why does everyone compare against `multiprocessing` when `concurrent.futures` (https://docs.python.org/3/library/concurrent.futures.html) has been a part of the standard library for 11 years. It's a much improved API and the are _almost_ no reasons to use `multiprocessing` any more.

[+] craigching|2 years ago|reply

Someone downvoted you, I upvoted because I think you have a good point but it would be nice to back it up. I think I agree with you, but I have only used concurrent.futures with threads.

[+] whalesalad|2 years ago|reply

I can think of a lot of reasons to use multiprocessing. I do it quite often. You can't always architect things to fit inside of a `with` context manager. Sometimes you need fine grain control over when the process starts, stops, how you handle various signals etc.

[+] wheelerof4te|2 years ago|reply

Agreed, concurrent.futures is a great stdlib module.

Too bad it has a somewhat odd name, which doesn't help newbies guess what it really does.

But in almost all cases, it can replace multiprocessing.

[+] stainablesteel|2 years ago|reply

i was initially using concurrent.futures for a lot of things and just assumed some of my code wasn't very multiprocessable when i saw it wasn't utilizing all my cores, but it was a speedup nonetheless, then i when i experimented with multiprocessing it gave me considerable speedups with full core usage. i was more happy than frustrated and switched everything i could and got benefit everywhere.

i usually test both when i write code nowadays and concurrent.futures is useful in maybe 10% of my cases.

[+] xapata|2 years ago|reply

It's somewhat like metaclasses. As the joke goes: If you have to ask, then you don't need metaclasses.

[+] trostaft|2 years ago|reply

The particular pain point of multiprocessing in python for me has been the limitations of the serializer. To that end, multiprocess, the replacement by the dill team, has been useful as a drop in replacement, but I'm still looking for better alternatives. This seems to support dill as an optional serializer so I'll take a look!

[+] jw887c|2 years ago|reply

Multiprocessing is great as a first pass parallelization but I've found that debugging it to be very hard, especially for junior employees.

It seems much easier to follow when you can push everything to horizontally scaled single processes for languages like Python.

[+] uniqueuid|2 years ago|reply

I agree. The main problems aren't syntax, they are architectural: Catching and retrying individual failures in a pool.map, anticipating OOM with heavy tasks, understanding process lifecycle and the underlying pickle/ipc.

All these are much more reliably solved with horizontal scaling.

[edit] by the way, a very useful minimal sugar on top of multiprocessing for one-off tasks is tqdm's process_map, which automatically shows a progress bar https://tqdm.github.io/docs/contrib.concurrent/

[+] dr_kiszonka|2 years ago|reply

Parsl has quite good debugging facilities built in, which include automatic logging and visualizations.

https://parsl.readthedocs.io/en/stable/faq.html

https://parsl.readthedocs.io/en/stable/userguide/monitoring....

[+] flakes|2 years ago|reply

Depends on the workflow. For one off jobs or client tooling, parallelism makes sense to have rapid user feedback.

For batch pipelines on that work many requests, having a serial workflow has a lot of the advantages you mention. Serial execution makes the load more predictable and makes scaling easier to rationalize.

[+] wheelerof4te|2 years ago|reply

Or just use numpy's arrays, which have their integrated multiprocessing.

[+] miohtama|2 years ago|reply

Another good library for concurrency and parallel tasks is futureproof:

https://github.com/yeraydiazdiaz/futureproof

> concurrent.futures is amazing, but it's got some sharp edges that have bit me many times in the past.

> Futureproof is a thin wrapper around it addressing some of these problems and adding some usability features.

[+] anotherpaulg|2 years ago|reply

I often use lox for this sort of thing. It can use threads or processes, and has a very ergonomic api.

https://github.com/BrianPugh/lox

[+] ewokone|2 years ago|reply

Thanks for sharing, this really looks promising for what I am looking for.

[+] unknown|2 years ago|reply

[deleted]

[+] urcyanide|2 years ago|reply

Some potential issues about Python multiprocessing https://blog.mapotofu.org/blogs/python-multiprocessing/. COW is quite tricky. BTW, most of the related official Python docs doesn’t mention the usage under ‘spawn’.

[+] liendolucas|2 years ago|reply

I've written a very tiny multiprocessing pipeline in Python. It's documented.

I've actually never made use of it but at the time I got a bit obsessed and wanted to write it. It does seem to work as expected.

Is highly hackable as it is only a single file and a couple of classes.

Maybe is useful to someone, here's the link: https://github.com/lliendo/SimplePipeline

[+] amelius|2 years ago|reply

Very cool.

Except I'm a bit concerned that it might have too many features. E.g. rendering of progress bars and such. This should really be in a separate package and not referenced from this package.

The multiprocessing module might not be great, but at least the maintainers have always been careful about feature creep.

[+] jmakov|2 years ago|reply

How is this different from ray.io?

[+] uniqueuid|2 years ago|reply

Ray is parallelism across machines, this is only across cores.

[+] IshKebab|2 years ago|reply

Why has Python never added something like We workers/isolates? That seems like the obvious thing to do but they only have multiprocess hacks.

[+] nine_k|2 years ago|reply

It sort of has, but it's a work in progress.

https://lwn.net/Articles/820424/

[+] misnome|2 years ago|reply

There has been lots of movement towards running multiple copies of the interpreter in the same process space, over the last several releases. I’m sure it’ll come at some point.

[+] stainablesteel|2 years ago|reply

i see that all the benchmarks have processpoolexecutor either equal to or outperforming multiprocessing and i do not find this to be the case for about 90% of my cases.

also a niche question, is this able to overcome the inability to pickle a function within another function to multiprocess it?

i'm still excited to try this as i haven't heard of it and good multiprocessing is hard to come by.

[+] captaintobs|2 years ago|reply

Why is this faster than the stdlib? What does it do to achieve better performance?

[+] uniqueuid|2 years ago|reply

It's in the readme of the github project.

> In short, the main reasons why MPIRE is faster are:

    When fork is available we can make use of copy-on-write shared objects, which reduces the need to copy objects that need to be shared over child processes

    Workers can hold state over multiple tasks. Therefore you can choose to load a big file or send resources over only once per worker

    Automatic task chunking

46 comments