I don't want to put professional qualities of Google engineers under doubt, but could someone explain to me why it's so complicated to remove GIL in Unladen Swallow? As far as I remember, UnSw targets LLVM which is an advanced VM with JIT compilation (i.e. not fully interpreted). Jython does (I'm not a Jython developer, so may not know the details) nearly the same, but targets the JVM and Jython does not have GIL as a result. What's the key difference between LLVM and JVM in this regard?
One other potential difficulty is that Jython doesn't aim to be 100% compatible with existing C extensions. Unladen swallow does. Plus, I believe Unladen Swallow is essentially a branch of the CPython source code while Jython was written from scratch.
Removing the GIL is easy if you remove compatibility requirements with C extensions. A lot of complex C extensions rely on the exact python ref counting semantics. That's why unladen swallow is interesting in the first place.
How about teaching programmers to program without using threads?
edit: sure downmod me. It's crazy talk! How could programmers do without threads and concurrency issues and all of the other blocking problems. Hardware should handle multiple cores. Not programmers.
There is probably more traction from what you are saying that some of the commenters suggest. For compute bound tasks, it is often productive to split computations into long-running processes.
There are two reasons that the discussion goes beyond what you suggest, I think. One is detailed in http://www.tbray.org/ongoing/When/200x/2009/09/27/Concur-dot... where the wide finder project is a way to explore relatively easy ways to split a log-searching task into effective threads on a multi-core machine.
This is really a hard problem, as evidenced by Tim's long series of articles detailing various forays into clojure and other languages.
Your comment "Hardware should handle multiple cores" reflects the opposite of what I think chip manufacturers are thinking. They run into the performance barrier, so they build a chip with more CPUs on it and hand the problem off to the compiler team and the rest of the software world.
I would take it another step further in challenging hardware manufacturers to look at the broader problem. There was an article recently that noticed that for Lisp, the effective performance gain over a decade or two went up by 50 where for C-family programs it went up by several orders of magnitude. To me this implies that hardware isn't going in the direction that supports higher-level computing.
Remember when the 360 instruction set came out? The 7094 people looked at it with some sense of dissapointment. And where are the nice instruction sets as evidenced by the PDP-10 and family?
Perhaps this implies smarter cores so that we don't have to do so many of them.
But in today's world, it seems that the languages that work well with multiple threads have a language construct that is required to make it work--libraries don't do the trick. The clean channels of GO and the constructs in Clojure point the way. Maybe the GIL-fix approach is truly doomed.
I think you're being downvoted because people disagree with what you're saying. In my experience you need multiple threads when you want multiple things to happen at the same time. i.e. if I have a client/server architecture and one client instructs the server to perform a long running task then I don't want the server to appear frozen to all my other clients, which it would if the server ran in a single thread. I don't really see how you can get around this. Do you have a solution?
Instead of dealing with all this complexity, I don't understand why a simpler approach is not used, like having a single interpreter per thread and a very good message passing strategy between interpreters.
You could already achieve that with OS processes and IPC. The whole point of having multi-threading is to be able to write compact, shared-memory code with minimal use of synchronization operators, and sharing as much code and data as possible.
One interpreter per-thread means all side-effects have to be migrated to the other threads to keep a consistent view of memory: guess what you will need to do that? yep, a global lock (except this time it's across all interpreters, instead of just one.)
I like perl's ithread setup a lot. You explicitly need to mark data as shared between threads, otherwise all variables/objects are local to the thread. Things like queues, for example, are implemented as shared @arrays with a locking primitive around the manipulations, mostly hidden behind the Queue API (Queue->enqueue and Queue->dequeue).
The interpreter code is still shared among all the threads, but each has thread local storage separate from the shared arena. I've found that explicitly needing to mark data as shared to be a big boon to development, I think it has helped reduce the number of the bugs related to shared state.
I'm no expert in the GIL, but pretty much every widely adopted Garbage Collection algorithm requires a "stop-the-world" phase where object references can't be changed. Every VM has some concept of "stop points" where all user code is suspended, but Python's GIL is much more wide-ranging than that found in say the JVM or .NET.
There are many ways to exploit multi-cores, multi-threading is just one of them. Many other techniques exist. Also, one thing to realize is that if speed really matters (like in scientific apps), you will get much higher speed increase by rewriting some parts in C than by allowing using all the cores from python (at least with only a couple of cores).
Finally, a point which is not often brought but is crucial in my opinion is about C extension: the GIL makes C extensions much easier to write. That's one big reason for python success in the first place.
Complex in memory data structures are the main use case that is not well supported by multi process architectures. There's a trend towards in memory databases and using disk for archival only, so this issue is becoming critical.
Processes don't share pointers, they only share BLOBs. Threads do share pointers and that's why they are more suitable to in memory data analysis/manipulation.
With native threads, all your threads have their own identity and they're known to the OS task scheduler. So they can all block or run independently. But with green threads, the OS doesn't know about your "threads"; when the parent process is blocked, so are all the threads.
No, I remember reading those comments before this revision.
I was under the impression that these Google code wiki pages were kept under source control so you should be able to view histories etc. but I can't see any obvious link.
Found it, the change to this section was done 33 hours ago, a diff can be found here:
At least they want to get rid of the horrible abomination known as reference counting. It's the primary reason why I've moved on to other languages for the sake of creating games.
[+] [-] smikhanov|16 years ago|reply
[+] [-] j_baker|16 years ago|reply
[+] [-] cconstantine|16 years ago|reply
[+] [-] cdavid|16 years ago|reply
[+] [-] axod|16 years ago|reply
edit: sure downmod me. It's crazy talk! How could programmers do without threads and concurrency issues and all of the other blocking problems. Hardware should handle multiple cores. Not programmers.
[+] [-] wglb|16 years ago|reply
There are two reasons that the discussion goes beyond what you suggest, I think. One is detailed in http://www.tbray.org/ongoing/When/200x/2009/09/27/Concur-dot... where the wide finder project is a way to explore relatively easy ways to split a log-searching task into effective threads on a multi-core machine.
This is really a hard problem, as evidenced by Tim's long series of articles detailing various forays into clojure and other languages.
Your comment "Hardware should handle multiple cores" reflects the opposite of what I think chip manufacturers are thinking. They run into the performance barrier, so they build a chip with more CPUs on it and hand the problem off to the compiler team and the rest of the software world.
I would take it another step further in challenging hardware manufacturers to look at the broader problem. There was an article recently that noticed that for Lisp, the effective performance gain over a decade or two went up by 50 where for C-family programs it went up by several orders of magnitude. To me this implies that hardware isn't going in the direction that supports higher-level computing.
Remember when the 360 instruction set came out? The 7094 people looked at it with some sense of dissapointment. And where are the nice instruction sets as evidenced by the PDP-10 and family?
Perhaps this implies smarter cores so that we don't have to do so many of them.
But in today's world, it seems that the languages that work well with multiple threads have a language construct that is required to make it work--libraries don't do the trick. The clean channels of GO and the constructs in Clojure point the way. Maybe the GIL-fix approach is truly doomed.
So I agree with your closing sentiment.
[+] [-] andrew1|16 years ago|reply
[+] [-] bioweek|16 years ago|reply
[+] [-] dkersten|16 years ago|reply
[+] [-] antirez|16 years ago|reply
[+] [-] mahmud|16 years ago|reply
One interpreter per-thread means all side-effects have to be migrated to the other threads to keep a consistent view of memory: guess what you will need to do that? yep, a global lock (except this time it's across all interpreters, instead of just one.)
[+] [-] thwarted|16 years ago|reply
I like perl's ithread setup a lot. You explicitly need to mark data as shared between threads, otherwise all variables/objects are local to the thread. Things like queues, for example, are implemented as shared @arrays with a locking primitive around the manipulations, mostly hidden behind the Queue API (Queue->enqueue and Queue->dequeue).
The interpreter code is still shared among all the threads, but each has thread local storage separate from the shared arena. I've found that explicitly needing to mark data as shared to be a big boon to development, I think it has helped reduce the number of the bugs related to shared state.
[+] [-] gaius|16 years ago|reply
[+] [-] j_baker|16 years ago|reply
[+] [-] j_baker|16 years ago|reply
I'd like it if someone could show me how the garbage collector is related to removing the GIL.
[+] [-] silentbicycle|16 years ago|reply
[+] [-] ig1|16 years ago|reply
[+] [-] Tuna-Fish|16 years ago|reply
[+] [-] euroclydon|16 years ago|reply
[+] [-] cdavid|16 years ago|reply
Finally, a point which is not often brought but is crucial in my opinion is about C extension: the GIL makes C extensions much easier to write. That's one big reason for python success in the first place.
[+] [-] zepolen|16 years ago|reply
[+] [-] fauigerzigerk|16 years ago|reply
Processes don't share pointers, they only share BLOBs. Threads do share pointers and that's why they are more suitable to in memory data analysis/manipulation.
[+] [-] mahmud|16 years ago|reply
[+] [-] heresy|16 years ago|reply
[+] [-] unknown|16 years ago|reply
[deleted]
[+] [-] ghshephard|16 years ago|reply
[+] [-] ZeroGravitas|16 years ago|reply
I was under the impression that these Google code wiki pages were kept under source control so you should be able to view histories etc. but I can't see any obvious link.
Found it, the change to this section was done 33 hours ago, a diff can be found here:
http://code.google.com/p/unladen-swallow/source/detail?spec=...
[+] [-] nihilocrat|16 years ago|reply