top | item 39490747

Neat Parallel Output in Python

131 points| surprisetalk | 2 years ago |bernsteinbear.com | reply

46 comments

order
[+] cb321|2 years ago|reply
You could also just send output to separate files by re-opening stdout/stderr and use https://www.vanheusden.com/multitail/index.html (or GNU screen or tmux or whatever) to multiplex a terminal in a more organized way. This also solves an "open problem" in the article of stray prints.

If you really want to share one terminal / stdout but also prevent timing-based record splitting, you could also send outputs to FIFOs / named pipes and then have a simple record boundary honoring merge program like https://github.com/c-blake/bu/blob/main/funnel.nim with doc at https://github.com/c-blake/bu/blob/main/doc/funnel.md As long as "record format" is shared (e.g. newline-terminated) this can also solve the stray print problem.

[+] ryan-duve|2 years ago|reply
When you say "multiplex a terminal" with tmux, do you mean splitting screens so the same terminal window has multiple shells prompts in it? I'm trying to understand how that would be used to address the problem in the demo at the bottom of the post.
[+] tgsovlerkhgsel|2 years ago|reply
I've got the impression that in practice, what keeps developers from letting trivially parallelizable tasks run in parallel is a) the overhead of dealing with poor parallelization primitives and b) the difficulty in properly showing the status of parallel invocations.

Having good features to support this in standard libraries would go a long way to incentivizing devs to actually parallelize.

[+] btown|2 years ago|reply
In a way, it's a subset of the distributed systems tracing problem - you have multiple tasks running in parallel on the same node, but they will have been initiated as different (sub) tasks, and should be tracked by the specific task via which they were initiated. So systems like OpenTelemetry and Honeycomb can be great for this, allowing you to see events in aggregate as well as in the context of a trace that propagates between different threads and systems.

https://opentelemetry.io/docs/languages/python/getting-start... https://docs.honeycomb.io/getting-data-in/opentelemetry/pyth...

But there's so much complexity there that IMO it's best left outside of standard libraries - and it's indeed a daunting amount of new vocabulary for newcomers. I'm not aware of simpler abstractions on top of the broader telemetry ecosystem for monitoring simple parallelization, but arguably there should be one that keeps things quite simple.

[+] dr_kiszonka|2 years ago|reply
Agreed. I have been using joblib for a good few months. It is fine, but I still haven't figured out basic things like printing the status of process-based jobs.

[Parsl is much better, e.g., logging is built-in, but it can be a little overwhelming.]

[+] RandomWorker|2 years ago|reply
Yes, totally agree. I’ve written some code and I’d rather convert it to C using cpython before I paralyze it. Python is horrible for both these things, and you may not even get a better speed increase because of the overhead. It’s like use cpython get 10-100x better speed with a few lines of code, or spend my whole day in a horrible mess of data structures and getting my functions to work with map properly with maybe nothing to show for it.
[+] agumonkey|2 years ago|reply
Every year I write a similar threaded cli monitor .. Now maybe rich can solve this for all, but i'm surprised it took so long to emerge.
[+] appplication|2 years ago|reply
Yes this is 100% the type of thing that should be in a standard library but also the type of thing I have no doubt Python steering would feel better belongs in a 3rd party library.

We do see some cool stuff under the hood from core Python devs but interest in further quality of life features seems to be lacking.

[+] acover|2 years ago|reply
What's a good way to show the status in the command line?

For a one off project it seems simpler to just write an html UI.

[+] rciorba|2 years ago|reply
Having the workers acquire the lock and update the terminal themselves seems like it would cause lock contention.

An alternative would be to have only the main process do the updating and have the workers message it about progress, using a queue.

[+] convivialdingo|2 years ago|reply
Funny enough Glibc has a lock for threaded printing internally. You can disable the lock in glibc with the __fsetlocking function and the FSETLOCKING_BYCALLER parameter.

I had a threaded server that we were debugging which would only dump state correctly if we deleted a printf right before in a different thread. Really confused me until I figured this out.

[+] ametrau|2 years ago|reply
This is what I would do. And every process has at least some state output always (so there are no blank lines).
[+] tekknolagi|2 years ago|reply
The real life processes take 30+s so contention isn't a big problem
[+] wolfskaempf|2 years ago|reply
I like the self-built approach especially for the learning value.

If you’re using this in a CLI tool you’re writing in Python you might be using the library rich anyway, which provides this functionality as well including some extra features.

https://rich.readthedocs.io/en/stable/progress.html

[+] morningsam|2 years ago|reply
What I don't like about rich is that, dependencies and all, its installed size comes out to around 20 MB. 9 MB of that is due to its dependency on pygments for syntax highlighting, which a lot of people probably don't even want/need.

If anyone knows of a smaller, more focused library providing something similar to rich's Live Display functionality, I'd appreciate it.

[+] pynchia|2 years ago|reply
Nice idea. However, please note: `map(func, repos)` does not do anything. Your code does not do any processing. Try it.

You need to consume the iterator that map returns.

Please use Python as it is supposed to be used, not as Fortran.

[+] isoprophlex|2 years ago|reply
The provided code doesn't seem self-contained, as if it would run just like that without modification. For example, where does 'num_lines' come from?

Edit: never mind. Ignore that. There is a link on the page which I overlooked, with a more complete example. https://gist.githubusercontent.com/tekknolagi/4bee494a6e4483...

Cool stuff. Now I'm eager to find a way to make this work for multiple tqdm progress bars, running in parallel.

[+] ibejoeb|2 years ago|reply
You probably don't want to run this gist directly. Looks like a risk of a runaway process creation, depending on the platform. The process creation code should be guarded by `if __name__ == "__main__"`.
[+] globular-toast|2 years ago|reply
I prefer using a purpose-built tool like GNU parallel. Parallel's only purpose is to run things in parallel and collect the results together. The advantage is you only have to learn to use it once, rather than learn to do this again and again in all the different languages/tools you might use.
[+] masklinn|2 years ago|reply
I can’t help but notice you have not explained how to handle progress reporting in gnu parallel.

Also do you really use parallel from software trying to parallelise its internal workload? Note that in tfa the workers are an implementation detail of a wider program.

[+] prosaole|2 years ago|reply
Try:

    parallel --latestline seq ::: {1..10}0000000
(Requires version 20220522)
[+] atoav|2 years ago|reply
So you have an example how to spawn gnu parallel from python and displaying progress?
[+] hackan|2 years ago|reply
I don't like the wild use of globals, even if they are "guarded" by locks. And then, oh boy there're locks! But, it surely works, so that's nice. It would be cool to have a small lib that solves this nicely :thinking:...
[+] echoangle|2 years ago|reply
I've used a separate printing thread printing everything from a queue, and had the other threads push everything they want to print to this queue. Is there some advantage to doing it like in the post over the queue method?
[+] huac|2 years ago|reply
clever! will have to see if this works with tqdm progress bars, has anyone tried that?
[+] masklinn|2 years ago|reply
tqdm has a position parameter which allows offsetting concurrent progress bars. It should work automatically for intra-process concurrency anyway. I don’t know if it works correctly with multiple processes though.
[+] cl3misch|2 years ago|reply
This is indeed neat, but I would be surprised if there wasn't already a library for such multi-line functionality.
[+] rnmmrnm|2 years ago|reply
cool call me when they got one for go lol
[+] tekknolagi|2 years ago|reply
This should work just fine as is ported to Go