Hm, this solution seems very cumbersome, inelegant and not like python's "batteries included" approach at all. This means that python will have native threads that behhave as expected minus true parallel execution, so you shouldn't use those, even though the interface is fairly simple. Instead, you should learn to use this weird contraption that is neither multiprocessing nor intuitive multithreading and comes with a cumbersome interface.
I get that the GIL is a very hard problem to solve, but this solution is so inelegant in my eyes that python would be better off without it. I'd feel better if this was a hidden implementation detail that coukd be improved transparently. Just my two cents.
>This means that python will have native threads that behhave as expected minus true parallel execution, so you shouldn't use those, even though the interface is fairly simple.
Python already has exactly that, and has had that for ages.
>Instead, you should learn to use this weird contraption that is neither multiprocessing nor intuitive multithreading and comes with a cumbersome interface.
It also comes with performance improvements over multiprocess, so there's that.
Besides the "cumbersome interface" is irrelevant, as it would be easy to wrap and forget about it, the same way nobody really uses urllib directly.
I completely disagree - Python threads are basically "green threads", so they have their place but aren't related to parallelisation. But true multiprocessing is ugly when you have hundreds of cores, which is where CPUs are going. There is no standard UI convention on most OSes to group those processes per app, in terms of signals or stats or whatever.
So besides the unproven possibility of removing the GIL, subinterpreters are the best way forward, better than threads or the multiprocessing package.
It's somewhat similar to the GIL removal effort in Ruby [1]
They are isolating the GIL into Guilds there, which are containers for language threads sharing the same GIL. They are providing two primitives for communication between threads in different guilds. Send, for immutable data (zero copy) and move, for mutable data (copy). They remove the need for the boiler plate code for marshalling and unmarshalling. However I bet that there will be some library to hide that code in Python too.
I proposed something similar for Python 9 years ago.[1] Guido didn't like it.
Objects would be either thread-local, shared and locked, or immutable.
Thread-local objects must be totally inaccessible from other threads, and not leakable across thread boundaries, for memory safety. (Python has "thread local" objects now, but it's just naming, and not airtight against leaks. You can assign a thread-local object to a global variable.) Shared and locked objects lock when you enter, unlock when you leave. Objects are thread-local by default, so single-thread programs work as before.
Minimize shared and locked, while using thread-local or immutable objects as much as possible. Locking is needed only for shared and locked objects.
This is almost conventional wisdom today, but 9 years ago, it was too radical.
Retrofitting concurrency is never pretty. But we have to. Individual CPUs are about the same speed per thread that they were a decade ago.
> This, in turn, means that Python developers can utilize async code, multi-threaded code and never have to worry about acquiring locks on any variables or having processes crash from deadlocks.
Dangerous advice. Whether this is true or not depends on lots of things such as how many and which operations you're doing on those variables.
Sure, CPython might do lots of simple operations atomically, but this is not enough to avoid the need for all locks. Threads can still interleave their execution in many ways.
The current state of threading and parallel processing in Python is a joke. While they are still clinging to the GIL and single core performance, the rest of the world is moving to 32 core (consumer) CPUs.
Python's performance, in general, is a crappy[1] and is beaten even by PHP these days. All the people that suggest relying on multiprocessing probably haven't done anything that's CPU and Memory intensive because if you have a code that operates on a "world-state" each new process will have to copy that from a parent. If the state takes ~10GB each process will multiply that.
Others keep suggesting Cython. Well, guess what? If I am required to use another programming language to use threads, I might as well go with Go/Rust/Java instead and save the trouble of dabbling with two languages.
So where does that leave (pure-)Python? It can only be used in I/O bound applications where the performance of the VM itself doesn't matter. So it's basically only used by web/desktop applications that CRUD the databases.
It's really amazing that the machine learning community has managed to hack around that with C-based libraries like SciPy and NumPy. However, my suggestion would be to drop GIL and copy the whatever model has been working for Go/Java/C#. If you can't drop GIL because some esoteric features depend on that, then drop them as well.
Cython is nice, but debugging it requires gdb. For the PyCharm-loving end-users it may be quite cumbersome.
Those recommending to use multiprocessing have probably never been in that bitter spot when serializing something and computing something takes exactly the same time.
The consistent requirement has been that Python will drop the GIL for anything that doesn't make single-threaded performance suffer. There has been substantial work to this end but no solution to date has achieved this goal.
> If the state takes ~10GB each process will multiply that.
In POSIX there is such thing as copy-on-write memory during forks.. So if that state is mostly read-only, additional memory required by each slave should be minimal.
This is essentially the same concurrency model as Workers in JS engines - on the one hand it’s a fairly limiting crutch[1], on the other hand it is harder to create a bunch of different classes of concurrency bugs.
[1] vs fully shared state of C-like, .NET, JVM, etc, etc. Rust’s no-shared-mutable state model allows it to do some fun stuff but python (and JS) don’t really have a strong concept of mutable vs immutable, let alone ownership so I don’t think it would be applicable?
This is just a way to do the same thing as "multiprocessing", but with less memory usage. You still have multiple Python instances that send messages back and forth.
I wonder if they ever fixed the CPickle bug which broke it if you were using CPickle from multiple threads.
Yeah, it's got some of the same weaknesses as multiprocessing (and several new ones). Conceivably you could provide an API for handing off objects to the other interpreter without copying. I'm imagining an API like:
my_foo = interpreterX.pass_object(my_foo)
(The assignment being required to delete the originating reference from the source interpreter.) The interface would be obligated to check that there are no references that escape to the current interpreter and then my_foo and all referenced objects could be handed off to the other interpreter in whole.
I don't have any intuitions for if that would be cheaper than copying or not, and getting it right is certainly more difficult than serialization. (Because of the complexity, it's not worth having if it isn't cheaper.)
Less memory usage, and - hopefully - without all the quirks that crop up with multiprocessing. Off the top of my head: subprocesses don't always want to die along with the main process; error conditions can cause the underlying IPC layer to end up in a permanently stalled state.
No, Mr. Click-baity-title it’s not. They’re still there just you can use many interpreters now like one would when using the multiprocessing module. I do like the idea of Go-like queues for message passing.
From my limited understanding, I think Eric Snow’s push to use subinterpreters is to move an orchestration layer for multiple Python processes from the service layer to the language layer. It may also modularize Pythons’s C API scope. It may also be one of the cheapest ways in order to provide for true CPU bound concurrency in Python, which is important given Python’s limited resources.
Wow, just like perl threads since perl 5.8 (1)
When in doubt, look at the granddaddy of scripting languages, all your trials and tribulations in scripting land have been considered in the past..
let's all sing 'living in the past' by Jethro Tull (2) this one is also good (3)
Tcl has had threads that were subinterpreters since a decade ago or more. I find it quite ironic that Python, it would seem, is reinventing it, only in a less elegant way.
I'm personally glad that Python is (poorly) copying this feature from Tcl. This means it's closer to the time when JavaScript (poorly) copies it from Python ! ;-)
This sounds like an application (or variation) of the apartment threading model[0]. Given the problem and it’s desrciption/characteristic (Global Interpretter Lock), this sounds like an elegant approach.
There's nothing wrong with the GIL as long as you know its there. It makes writing concurrent code in Python semi-magical and thats a huge benefit. Concurrent != parallel though, so if there's really a need to scale up to multiple cores there's always the option of forking with multi-processing or "sub interpreters."
I can think of maybe having network code run in their own process and the UI in another. That way there's no risk of bottle necks slowing down the UI and transfers are likewise protected. If you look at bottle.py it seems that this approach could add A LOT of performance for managing downloads / uploads if it's done right.
> Another issue is that file handles belong to the process, so if you have a file open for writing in one interpreter, the sub interpreter won’t be able to access the file (without further changes to CPython).
Wouldn't just using CLONE_FILES when forking off interpreters solve this problem?
It could be better phrased: "whilst CPython can be multi-threaded, only 1 thread can be executing Python code at any given time." Other threads can be doing other things at the same time -- just not actively interpreting Python bytecode.
It is because only one thread at a time holds the lock in order to avoid race conditions. The keynote[1] by Raymond Hettinger from PyBay '17 will be a great place to start if you are new to this.
Not all operations are CPU bound. For anything that is IO bound, such as reading a file, db access, network calls, etc, CPython threads work just fine.
Parallelism allows multiple threads interleave with each other. It does not guarantee parallelism (two or more threads executing at the same time). It's similar to multiple threads operating on a uniprocessor system, with the difference that I/O can happen in parallel.
Are there any overall benchmarks for Python 3.8 yet? I know there are a bunch of performance improvements for calling functions and creating objects, but I have no idea how that translates to real software.
Huh. This sounds a lot like Ruby Guilds. This looks it will land sooner, though likely in less complete form, as even the prototype Guild implementation has inter-guild communication.
[+] [-] gmueckl|6 years ago|reply
I get that the GIL is a very hard problem to solve, but this solution is so inelegant in my eyes that python would be better off without it. I'd feel better if this was a hidden implementation detail that coukd be improved transparently. Just my two cents.
[+] [-] coldtea|6 years ago|reply
Python already has exactly that, and has had that for ages.
>Instead, you should learn to use this weird contraption that is neither multiprocessing nor intuitive multithreading and comes with a cumbersome interface.
It also comes with performance improvements over multiprocess, so there's that.
Besides the "cumbersome interface" is irrelevant, as it would be easy to wrap and forget about it, the same way nobody really uses urllib directly.
[+] [-] ru999gol|6 years ago|reply
[+] [-] akvadrako|6 years ago|reply
So besides the unproven possibility of removing the GIL, subinterpreters are the best way forward, better than threads or the multiprocessing package.
[+] [-] pmontra|6 years ago|reply
They are isolating the GIL into Guilds there, which are containers for language threads sharing the same GIL. They are providing two primitives for communication between threads in different guilds. Send, for immutable data (zero copy) and move, for mutable data (copy). They remove the need for the boiler plate code for marshalling and unmarshalling. However I bet that there will be some library to hide that code in Python too.
[1] http://www.atdot.net/%7Eko1/activities/2018_RubyElixirConfTa...
[+] [-] Animats|6 years ago|reply
I proposed something similar for Python 9 years ago.[1] Guido didn't like it.
Objects would be either thread-local, shared and locked, or immutable. Thread-local objects must be totally inaccessible from other threads, and not leakable across thread boundaries, for memory safety. (Python has "thread local" objects now, but it's just naming, and not airtight against leaks. You can assign a thread-local object to a global variable.) Shared and locked objects lock when you enter, unlock when you leave. Objects are thread-local by default, so single-thread programs work as before.
Minimize shared and locked, while using thread-local or immutable objects as much as possible. Locking is needed only for shared and locked objects.
This is almost conventional wisdom today, but 9 years ago, it was too radical.
Retrofitting concurrency is never pretty. But we have to. Individual CPUs are about the same speed per thread that they were a decade ago.
[1] http://animats.com/papers/languages/pythonconcurrency.html
[+] [-] riffraff|6 years ago|reply
That might not be a bad idea because I am worried `move` will end up being problematic in ruby, but time will tell.
[+] [-] FartyMcFarter|6 years ago|reply
Dangerous advice. Whether this is true or not depends on lots of things such as how many and which operations you're doing on those variables.
Sure, CPython might do lots of simple operations atomically, but this is not enough to avoid the need for all locks. Threads can still interleave their execution in many ways.
See also: https://blog.qqrs.us/blog/2016/05/01/which-python-operations...
[+] [-] tasubotadas|6 years ago|reply
Python's performance, in general, is a crappy[1] and is beaten even by PHP these days. All the people that suggest relying on multiprocessing probably haven't done anything that's CPU and Memory intensive because if you have a code that operates on a "world-state" each new process will have to copy that from a parent. If the state takes ~10GB each process will multiply that.
Others keep suggesting Cython. Well, guess what? If I am required to use another programming language to use threads, I might as well go with Go/Rust/Java instead and save the trouble of dabbling with two languages.
So where does that leave (pure-)Python? It can only be used in I/O bound applications where the performance of the VM itself doesn't matter. So it's basically only used by web/desktop applications that CRUD the databases.
It's really amazing that the machine learning community has managed to hack around that with C-based libraries like SciPy and NumPy. However, my suggestion would be to drop GIL and copy the whatever model has been working for Go/Java/C#. If you can't drop GIL because some esoteric features depend on that, then drop them as well.
[1] https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
[+] [-] AlexTWithBeard|6 years ago|reply
Those recommending to use multiprocessing have probably never been in that bitter spot when serializing something and computing something takes exactly the same time.
Also forking didn't really work until Python 3.6.
[+] [-] dual_basis|6 years ago|reply
[+] [-] unknown|6 years ago|reply
[deleted]
[+] [-] gray_-_wolf|6 years ago|reply
In POSIX there is such thing as copy-on-write memory during forks.. So if that state is mostly read-only, additional memory required by each slave should be minimal.
[+] [-] juststeve|6 years ago|reply
And there's also Kotlin.
[+] [-] olliej|6 years ago|reply
[1] vs fully shared state of C-like, .NET, JVM, etc, etc. Rust’s no-shared-mutable state model allows it to do some fun stuff but python (and JS) don’t really have a strong concept of mutable vs immutable, let alone ownership so I don’t think it would be applicable?
[+] [-] Animats|6 years ago|reply
I wonder if they ever fixed the CPickle bug which broke it if you were using CPickle from multiple threads.
[+] [-] loeg|6 years ago|reply
I don't have any intuitions for if that would be cheaper than copying or not, and getting it right is certainly more difficult than serialization. (Because of the complexity, it's not worth having if it isn't cheaper.)
[+] [-] mintplant|6 years ago|reply
[+] [-] gigatexal|6 years ago|reply
[+] [-] sbierwagen|6 years ago|reply
https://en.wikipedia.org/wiki/Betteridge%27s_law_of_headline...
[+] [-] yingw787|6 years ago|reply
[+] [-] MichaelMoser123|6 years ago|reply
(1) https://perldoc.perl.org/threads.html
(2) https://m.youtube.com/watch?v=EsCyC1dZiN8
(3) https://m.youtube.com/watch?v=mXeoNX7DSc8
[+] [-] andrewshadura|6 years ago|reply
[+] [-] rkeene2|6 years ago|reply
[+] [-] cmacleod4|6 years ago|reply
[+] [-] mixmastamyk|6 years ago|reply
[+] [-] yjftsjthsd-h|6 years ago|reply
[+] [-] fithisux|6 years ago|reply
[+] [-] bch|6 years ago|reply
[0] https://docs.microsoft.com/en-us/windows/desktop/com/process...
[+] [-] mballantyne|6 years ago|reply
[+] [-] Uptrenda|6 years ago|reply
I can think of maybe having network code run in their own process and the UI in another. That way there's no risk of bottle necks slowing down the UI and transfers are likewise protected. If you look at bottle.py it seems that this approach could add A LOT of performance for managing downloads / uploads if it's done right.
[+] [-] weberc2|6 years ago|reply
[+] [-] cyphar|6 years ago|reply
Wouldn't just using CLONE_FILES when forking off interpreters solve this problem?
[+] [-] unknown|6 years ago|reply
[deleted]
[+] [-] qwerty456127|6 years ago|reply
How does this make sense? What's the point of having multiple threads then?
[+] [-] jcl|6 years ago|reply
[+] [-] xkgt|6 years ago|reply
[1] https://youtu.be/9zinZmE3Ogk
[+] [-] keypusher|6 years ago|reply
[+] [-] pletnes|6 years ago|reply
[+] [-] isbvhodnvemrwvn|6 years ago|reply
[+] [-] munchbunny|6 years ago|reply
[+] [-] boulos|6 years ago|reply
In practice, this doesn’t work particularly well, as you rarely have massively I/O bound things in Python.
[+] [-] riskneutral|6 years ago|reply
So ... No.
[+] [-] Alex3917|6 years ago|reply
[+] [-] dragonwriter|6 years ago|reply
[+] [-] sciurus|6 years ago|reply