'We are no longer as optimistic about removing the GIL completely. '

[+] smikhanov|16 years ago|reply

I don't want to put professional qualities of Google engineers under doubt, but could someone explain to me why it's so complicated to remove GIL in Unladen Swallow? As far as I remember, UnSw targets LLVM which is an advanced VM with JIT compilation (i.e. not fully interpreted). Jython does (I'm not a Jython developer, so may not know the details) nearly the same, but targets the JVM and Jython does not have GIL as a result. What's the key difference between LLVM and JVM in this regard?

[+] j_baker|16 years ago|reply

One other potential difficulty is that Jython doesn't aim to be 100% compatible with existing C extensions. Unladen swallow does. Plus, I believe Unladen Swallow is essentially a branch of the CPython source code while Jython was written from scratch.

[+] cconstantine|16 years ago|reply

The simple answer is that LLVM does not come with a garbage collection system, and the JVM does.

[+] cdavid|16 years ago|reply

Removing the GIL is easy if you remove compatibility requirements with C extensions. A lot of complex C extensions rely on the exact python ref counting semantics. That's why unladen swallow is interesting in the first place.

[+] axod|16 years ago|reply

How about teaching programmers to program without using threads?

edit: sure downmod me. It's crazy talk! How could programmers do without threads and concurrency issues and all of the other blocking problems. Hardware should handle multiple cores. Not programmers.

[+] wglb|16 years ago|reply

There is probably more traction from what you are saying that some of the commenters suggest. For compute bound tasks, it is often productive to split computations into long-running processes.

There are two reasons that the discussion goes beyond what you suggest, I think. One is detailed in http://www.tbray.org/ongoing/When/200x/2009/09/27/Concur-dot... where the wide finder project is a way to explore relatively easy ways to split a log-searching task into effective threads on a multi-core machine.

This is really a hard problem, as evidenced by Tim's long series of articles detailing various forays into clojure and other languages.

Your comment "Hardware should handle multiple cores" reflects the opposite of what I think chip manufacturers are thinking. They run into the performance barrier, so they build a chip with more CPUs on it and hand the problem off to the compiler team and the rest of the software world.

I would take it another step further in challenging hardware manufacturers to look at the broader problem. There was an article recently that noticed that for Lisp, the effective performance gain over a decade or two went up by 50 where for C-family programs it went up by several orders of magnitude. To me this implies that hardware isn't going in the direction that supports higher-level computing.

Remember when the 360 instruction set came out? The 7094 people looked at it with some sense of dissapointment. And where are the nice instruction sets as evidenced by the PDP-10 and family?

Perhaps this implies smarter cores so that we don't have to do so many of them.

But in today's world, it seems that the languages that work well with multiple threads have a language construct that is required to make it work--libraries don't do the trick. The clean channels of GO and the constructs in Clojure point the way. Maybe the GIL-fix approach is truly doomed.

So I agree with your closing sentiment.

[+] andrew1|16 years ago|reply

I think you're being downvoted because people disagree with what you're saying. In my experience you need multiple threads when you want multiple things to happen at the same time. i.e. if I have a client/server architecture and one client instructs the server to perform a long running task then I don't want the server to appear frozen to all my other clients, which it would if the server ran in a single thread. I don't really see how you can get around this. Do you have a solution?

[+] bioweek|16 years ago|reply

Sounds like a good idea. I'd upvote you but I can't seem to vote on comments for some reason. Too young, or new?

[+] dkersten|16 years ago|reply

And hardware can handle concurrency - you just need to switch to a dataflow language.

[+] antirez|16 years ago|reply

Instead of dealing with all this complexity, I don't understand why a simpler approach is not used, like having a single interpreter per thread and a very good message passing strategy between interpreters.

[+] mahmud|16 years ago|reply

You could already achieve that with OS processes and IPC. The whole point of having multi-threading is to be able to write compact, shared-memory code with minimal use of synchronization operators, and sharing as much code and data as possible.

One interpreter per-thread means all side-effects have to be migrated to the other threads to keep a consistent view of memory: guess what you will need to do that? yep, a global lock (except this time it's across all interpreters, instead of just one.)

[+] thwarted|16 years ago|reply

Like perl's ithread implementation?

I like perl's ithread setup a lot. You explicitly need to mark data as shared between threads, otherwise all variables/objects are local to the thread. Things like queues, for example, are implemented as shared @arrays with a locking primitive around the manipulations, mostly hidden behind the Queue API (Queue->enqueue and Queue->dequeue).

The interpreter code is still shared among all the threads, but each has thread local storage separate from the shared arena. I've found that explicitly needing to mark data as shared to be a big boon to development, I think it has helped reduce the number of the bugs related to shared state.

[+] gaius|16 years ago|reply

This is how Tcl works.

[+] j_baker|16 years ago|reply

You mean like the multiprocessing package? http://docs.python.org/library/multiprocessing.html

[+] j_baker|16 years ago|reply

This prompted me to ask this question on stackoverflow: http://stackoverflow.com/questions/1914605/what-does-pythons...

I'd like it if someone could show me how the garbage collector is related to removing the GIL.

[+] silentbicycle|16 years ago|reply

Python's garbage collector itself isn't "thread safe", so removing the GIL without replacing the GC would cause major problems.

[+] ig1|16 years ago|reply

I'm no expert in the GIL, but pretty much every widely adopted Garbage Collection algorithm requires a "stop-the-world" phase where object references can't be changed. Every VM has some concept of "stop points" where all user code is suspended, but Python's GIL is much more wide-ranging than that found in say the JVM or .NET.

[+] Tuna-Fish|16 years ago|reply

Glad to be of service. ;)

[+] euroclydon|16 years ago|reply

If Python can't get this threading thing worked out, isn't the language going to get left behind as parallel architecture marches onward?

[+] cdavid|16 years ago|reply

There are many ways to exploit multi-cores, multi-threading is just one of them. Many other techniques exist. Also, one thing to realize is that if speed really matters (like in scientific apps), you will get much higher speed increase by rewriting some parts in C than by allowing using all the cores from python (at least with only a couple of cores).

Finally, a point which is not often brought but is crucial in my opinion is about C extension: the GIL makes C extensions much easier to write. That's one big reason for python success in the first place.

[+] zepolen|16 years ago|reply

Why are real threads so important? Does anyone have an example where threads would be much better than using the multiprocessing module?

[+] fauigerzigerk|16 years ago|reply

Complex in memory data structures are the main use case that is not well supported by multi process architectures. There's a trend towards in memory databases and using disk for archival only, so this issue is becoming critical.

Processes don't share pointers, they only share BLOBs. Threads do share pointers and that's why they are more suitable to in memory data analysis/manipulation.

[+] mahmud|16 years ago|reply

With native threads, all your threads have their own identity and they're known to the OS task scheduler. So they can all block or run independently. But with green threads, the OS doesn't know about your "threads"; when the parent process is blocked, so are all the threads.

[+] heresy|16 years ago|reply

Blocking operating system calls? This has always been the problem I've run into with green threads.

[+] unknown|16 years ago|reply

[deleted]

[+] ghshephard|16 years ago|reply

March 2009, presuming the comments followed the creation of that article.

[+] ZeroGravitas|16 years ago|reply

No, I remember reading those comments before this revision.

I was under the impression that these Google code wiki pages were kept under source control so you should be able to view histories etc. but I can't see any obvious link.

Found it, the change to this section was done 33 hours ago, a diff can be found here:

http://code.google.com/p/unladen-swallow/source/detail?spec=...

[+] nihilocrat|16 years ago|reply

At least they want to get rid of the horrible abomination known as reference counting. It's the primary reason why I've moved on to other languages for the sake of creating games.

124 comments