top | item 22010876

(no title)

I'm the original author of that github repo and I agree with all of this. The original reason why I wrote it was to familiarize myself with the then new features in the C++11 standard. Particularly I liked the simplicity of std::async but didn't like that:

1. there is (was?) the async future glitch in the standard where futures were generally specified to not wait for the task to finish in the destructor but the specification for async said they did. So if you just handed around futures of unknown origin you'd never know what the actual behavior was.

2. async doeesn't really specify how it works. Will there be a thread per task? Is there a maximum? Is it a fixed amount like in this implementation? Will it launch the items in the order they were queued in?

I wrote this not because I think it is somehow "correct" for the general case but because it is the simplest mental model to reason about for me. My intention was to have the most basic common denominator for what passes as a fixed size thread pool. That is also why it "isn't maintained" because it does exactly what I meant it to do and the only reason to change something about it would be to fix bugs. I could admittedly be more active in answering issues etc..

What I consider fundamental functionality is obviously also determined by the use cases I care about. Most of which at the time were of the form of expensive data parallel bulk operations to be distributed among threads in large chunks. Each item would usually take at least miliseconds, own all its data and publish the result via the future at the end. For that this model works well.

If your problem requires handling fancy dependency graphs, fine grained synchronization between threads or the work items are tiny and cache misses and queue synchronization overhead are comparatively expensive then this is not the right approach.

discuss

brandmeyer|6 years ago

I also liked the simplicity of the std::async interface. But since the first few implementations just created a whole thread to execute them, I haven't ever used it. NPTL is pretty efficient at spawning and shutting down threads, but I still wouldn't spin up an entire thread unless the unit of work was several milliseconds at least.

Intel published a bunch of helpful papers to WG21 that taught me much of the trade space, or at least served as entry points to the research. I think that their conclusions about Cilk were spot-on, in that

- Its hard to model fork/join parallelism well without language changes.

- Fork/join may not be one-size-fits-all, but it does fit a lot of problems. A well-fleshed-out work-stealing system can solve a lot of problems from a relatively simple API.

At $BigCo, we had a threadpool library with custom futures that would steal some work from the pool when the result wasn't ready. It still suffered from the performance failure modes I mentioned, but at least it wouldn't deadlock on you.

IMO, the most important action item for your library is to clearly document the deadlock risk.