top | item 37172470

(no title)

Znafon | 2 years ago

> If you need concurrency at the moment, you have already switched to using multiprocessing, so having a no-GIL multithreading is useless.

> The only issue with Python/multiprocessing, is that sometimes you don't want queues, but shared mutable state. And as you said, placing Python objects in shared memory at the moment is convoluted, restrictive, and suboptimal.

The PEP goes into the motivation behind this work, and using multiple process does not magically solves all the issues:

> Multiprocessing, with communication via shared memory or UNIX sockets, adds much complexity and in effect rules out interacting with CUDA from different workers, severely restricting the design space.

> I reimplemented parts of HMMER, a standard method for multiple-sequence alignment. I chose this method because it stresses both single-thread performance (scoring) and multi-threaded performance (searching a database of sequences). The GIL became the bottleneck when using only eight threads. This is a method where the current popular implementations rely on 64 or even 128 threads per process. I tried moving to subprocesses but was blocked by the prohibitive IPC costs.

> NumPy does release the GIL in its inner loops (which do the heavy lifting), but that is not nearly enough. NumPy doesn’t offer a solution to utilize all CPU cores of a single machine well, and instead leaves that to Dask and other multiprocessing solutions. Those aren’t very efficient and are also more clumsy to use. That clumsiness comes mainly in the extra abstractions and layers the users need to concern themselves with when using, e.g., dask.array which wraps numpy.ndarray. It also shows up in oversubscription issues that the user must explicitly be aware of and manage via either environment variables or a third package, threadpoolctl. The main reason is that NumPy calls into BLAS for linear algebra - and those calls it has no control over, they do use all cores by default via either pthreads or OpenMP.

and it discusses the alternatives at https://peps.python.org/pep-0703/#alternatives.

discuss

zarzavat|2 years ago

You don’t need OS processes for multiprocessing. You can use threads in the same OS process. See: Erlang.

Znafon|2 years ago

Would the work on sub-interpreters be interested for that then (https://lwn.net/SubscriberLink/941090/8bcb029dbf548f26/) ?