Clojure's STM has several advantages going for it:
* Clojure's data structures are purely functional, and built with the STM in mind. Updates and rollbacks to structures are cheap.
* Clojure user-level code has always had access to the STM. Because that's the accepted paradigm, There are an order of magnitude fewer assignments in typical clojure code, greatly simplifying the problem. In typical clojure, one function that uses the STM will call 10 pure functions. In typical python, most of those 10 calls will all mutate local state.
* Clojure requires all STM activity to happen within the dosync block. In python, that would be like requiring the magic to happen within a "with transaction(): ...". Again, this limits the amount of work that needs to happen. The compiler doesn't need to decide whether or not a random function needs to be a transaction; the user will tell the compiler.
* Clojure's STM explicitly only works with functional code. This will likely cause fewer leaky abstractions going forward. PyPy will have to modify all existing data structure classes to work with the new STM, and they won't be able to fix all the library and user code out in the world.
Good luck to them, but I suspect bolting STM into an existing language is much harder than designing it from scratch, similar to building in GC.
How does Clojure enforce that there are no side-effects in the transactional code? Is there a macro that "compiles" the transaction somehow?
STM in Haskell is very natural, since Haskell already enforces pure code and side-effecting code. STM and STVar work pretty similarly to IO and IORef or ST and TVar.
I think you've misunderstood what they're proposing here. Especially this:
> Clojure requires all STM activity to happen within the dosync block. In python, that would be like requiring the magic to happen within a "with transaction(): ...".
Under the proposal, all Python code would run under STM, only C-API and OS-level codes would run outside STM.
Edit: to clarify, the most common issue with implementing a viable STM (in just about every language apart from haskell) is the interaction between STM-managed objects and non-stm-managed objects. Under Pypy's proposal, there would be no such thing as all objects would be STM-managed.
This may work, this may not, but some nod to the fact that they've read things like http://queue.acm.org/detail.cfm?id=1454466 (as linked by codevine before me) and have some plan to address the problems would be nice. So far, attempts to bodge STM on to an imperative language that wasn't built with it in mind has not merely been failure, but radical failure. Immense amount of effort by very smart people, followed by the total unusability and unsalvagability of the resulting code.
The fact that only place it has been shown to work is basically Clojure and Haskell needs to be addressed; how are you going to address the apparent fundamental incompatibility between imperative languages and STM? And while you may be able to answer that at the VM level (very tiny transactions implemented between opcodes, basically), I would expect it is absolutely hopeless to allow users access to this. How are you going to address the endless-side-effect replay problem? (Hint: You aren't.)
This is a kill-PyPy level of dangerous decision, and I think everyone on that team ought to demand a very good answer to that question before buying into this.
Until we have a new conceptual breakthrough, multiversion concurrency control (MVCC) [1], which STM represents in this context, is the only solution they have.
Yes it will impose a tax on all mutable data structures, and push people towards a more functional style of programming. This is generally good. It may be a challenge for people who think solely in terms of OOP, but then OOP will evolve in response, and that is also good.
The way I read it, it's a PyPy-the-VM feature, used in RPython, but not a PyPy-the-interpreter feature callable in the Python code interpreted, hence it's not available to users (i.e Python devs)
From a PyPy as a practical tool point of view, I agree.
From a PyPy as a research project, I'm excited. I'm skeptical of STM as a synchronization solution as well, but I still want us to explore it. If their approach truly is novel, then maybe there's something there.
We've been mashing out Python for the last few months at my job, and we are consistently struggling with the Python GIL performance penalty for threads.
If PyPy can break the GIL, I would anticipate that we'd seriously consider pushing our users toward it.
Microsoft has been doing extensive research with STM, so their work might be worth looking at along the way. A working STM package (STM.NET) was actually shipped with VS2010, where you can type this:
Atomic.Do(() => { /* atomic code */ })
Here's some channel 9 videos about their progress, which are in of themselves very interesting:
>if [a JVM thread] reads a field of an object, and another thread writes to the same field of the same object, then the program will see either the old value, or the new value, but not something else entirely
Nitpick: unless the field in questions is a "long" or "double". In those cases the operation becomes two bytecodes and you can in fact see something else entirely if you are unlucky. At least it used to be like that.
There's the same issue with normal locks: any synchronization primitive is going to incur overhead. That is, in fact, one of the defense heralded by some in the Python community against the "GIL should Go" mobthink.
Python will always fail to do any true shared memory concurrency in my mind, so why not go the other route, have concurrency with enforcement of no shared state and instead only allow message passing and selective receive.
Thats all the rage these days anyways as its easier to reason about. Better still it matches distributed computing models much better.
Would it be possible to only run the STM code when the program is multi-threaded? This would require the program detecting that it is single or multi threaded (would that be difficult?) and might provide a good trade off: either be single threaded and be fast, or use a lot of threads and be 2-10 times slower than Cython, but without the GIL.
I'm a little disapointed. PyPy has just recently been becoming a fast attractive alternative to CPython (or Cython compilation).
If they go the STM route and spend years hashing out the details only to be 2x as slow as CPython, then that cuts out a huge number of people who would like to practically use PyPy.
I think he might have meant 2x slower than without STM and not 2x slower than CPython. Also from everything I have read you would still have the option to not use STM at all and just use the GIL which would not come with the extra overhead.
STM and locks have some interesting clean-up and (as is cited in the article) you really need for it not to crash and not to deadlock or go all priority-inverted on you.
Message-passing, on the other hand, doesn't need particular locking and crashes are (somewhat) simpler to deal with.
(And most any scheme I've encountered can deadlock, and will usually choose the most inopportune moment to demonstrate this capability.)
With this in mind, it is possible to imagine a whole-
program transformation that would add locking on every
object manipulated in RPython by the interpreter. This
would end up in a situation similar to Jython. However,
it would not automatically solve the issue of deadlocks,
which is avoided in the case of Jython by careful manual
placement of the locks. (In fact, being deadlock-free is
a global program property that cannot be automatically
ensured or verified; any change to Jython can in theory
break this property, and thus introduce subtle deadlocks.
The same applies to non-atomicity.)
This is just not true. It is true for the general case, but not for this specific case. Remember the pithy solution to the dining philosophers (number all the chopsticks and pick up the lowest numbered one first)?
I don't know the details of the pypy interpreter, but it does seem that given a byte-code that affects several objects, you could just acquire the locks for the objects as ordered by the objects address.
Now the pypy guys are very smart and this is a simple solution, so I'm sure it yields horrible performance, but the fact of the matter is it's not true that "being deadlock-free is a global program property that cannot be automatically ensured" (it is true that it can't be verified, but if you generate all the locking code it can be ensured by strict ordering) and to say otherwise is wrong.
You missed this point: and moreover if you need to have accessed the first object before you can decide that you will need to access the second object
That is, he's talking about situations where they don't have full knowledge when entering a critical section which objects will be accessed. Hence, they can't order lock acquisition in such a way to avoid deadlock.
We are using Haskell + STM for scaling a major multicore system at Bump, and it's been exceedingly awesome thus far. However, STM happens to suit Haskell's type system-enforced purity particularly well; it's hard to see this being anything but a monumental undertaking in a language like Python.
> Does this proposal entail writing Python code without mutation of non-local state?
Pypy using STM should not change the semantics of the Python language or require alterations to code which does not rely on implementation details (code which relies on the GIL for its thread-safety may have to be fixed, it already has to be fixed for Jython compatibility).
> Isn't this harder in Python than in Haskell or Ocaml?
I'd expect STM to be much harder in imperative languages than in functional languages indeed. In fact, I don't think I've heard of a successful STM implementation in imperative languages yet.
On the other hand, I think most tentatives so far have been with STM as a library applied explicitly, and you run into issues of (mutable) objects "crossing" the transaction barrier in imperative languages, issue which "does not" happen in Haskell (there are ways to subvert the type system, but then you're on your own. In normal uses, I think objects under STM live in a different monad, and therefore can't be freely mixed with objects living outside of it) (I believe this issue would happen in OCaml, so on that front OCaml would be little better than imperative languages). This issue should not exist if all of the code runs under STM (ignoring explicit hooks to do so built into the system), and this should allow for "easy" implementation of explicit memory transactions as well (one of STM's promises is composability, so you should be able to compose multiple bytecode-transactions in a single explicit transaction).
Afaik, all builtin functions are considered to be atomic.
Wouldn't it be enough to have a mutex for each object and when such a builtin function is called, all objects which are passed as parameters (including and esp. `self`) are locked?
No, that is not enough. If you want to implement atomicity for operations that touch multiple objects, a simple mutex-per-object approach just won't cut it. If you start holding multiple mutexes at once, there will be deadlocking if they are not locked in a consistent order.
Not to mention that if every object has a mutex associated, the performance will be horrible. Even if the mutexes are not contended, there will be so many atomic compare and swap operations that the interprocessor memory bus will simply die.
If your solution would work, every method in Java would be synchronized by default.
[+] [-] arohner|14 years ago|reply
* Clojure's data structures are purely functional, and built with the STM in mind. Updates and rollbacks to structures are cheap.
* Clojure user-level code has always had access to the STM. Because that's the accepted paradigm, There are an order of magnitude fewer assignments in typical clojure code, greatly simplifying the problem. In typical clojure, one function that uses the STM will call 10 pure functions. In typical python, most of those 10 calls will all mutate local state.
* Clojure requires all STM activity to happen within the dosync block. In python, that would be like requiring the magic to happen within a "with transaction(): ...". Again, this limits the amount of work that needs to happen. The compiler doesn't need to decide whether or not a random function needs to be a transaction; the user will tell the compiler.
* Clojure's STM explicitly only works with functional code. This will likely cause fewer leaky abstractions going forward. PyPy will have to modify all existing data structure classes to work with the new STM, and they won't be able to fix all the library and user code out in the world.
Good luck to them, but I suspect bolting STM into an existing language is much harder than designing it from scratch, similar to building in GC.
[+] [-] exDM69|14 years ago|reply
STM in Haskell is very natural, since Haskell already enforces pure code and side-effecting code. STM and STVar work pretty similarly to IO and IORef or ST and TVar.
[+] [-] masklinn|14 years ago|reply
> Clojure requires all STM activity to happen within the dosync block. In python, that would be like requiring the magic to happen within a "with transaction(): ...".
Under the proposal, all Python code would run under STM, only C-API and OS-level codes would run outside STM.
Edit: to clarify, the most common issue with implementing a viable STM (in just about every language apart from haskell) is the interaction between STM-managed objects and non-stm-managed objects. Under Pypy's proposal, there would be no such thing as all objects would be STM-managed.
[+] [-] jerf|14 years ago|reply
The fact that only place it has been shown to work is basically Clojure and Haskell needs to be addressed; how are you going to address the apparent fundamental incompatibility between imperative languages and STM? And while you may be able to answer that at the VM level (very tiny transactions implemented between opcodes, basically), I would expect it is absolutely hopeless to allow users access to this. How are you going to address the endless-side-effect replay problem? (Hint: You aren't.)
This is a kill-PyPy level of dangerous decision, and I think everyone on that team ought to demand a very good answer to that question before buying into this.
[+] [-] pradocchia|14 years ago|reply
Yes it will impose a tax on all mutable data structures, and push people towards a more functional style of programming. This is generally good. It may be a challenge for people who think solely in terms of OOP, but then OOP will evolve in response, and that is also good.
[1] http://en.wikipedia.org/wiki/Multiversion_concurrency_contro...
[+] [-] lloeki|14 years ago|reply
[+] [-] scott_s|14 years ago|reply
From a PyPy as a research project, I'm excited. I'm skeptical of STM as a synchronization solution as well, but I still want us to explore it. If their approach truly is novel, then maybe there's something there.
[+] [-] pnathan|14 years ago|reply
If PyPy can break the GIL, I would anticipate that we'd seriously consider pushing our users toward it.
[+] [-] DasIch|14 years ago|reply
It would not be the first time they tried implementing something and failed either.
[+] [-] joeyespo|14 years ago|reply
http://channel9.msdn.com/Shows/Going+Deep/Software-Transacti...
http://channel9.msdn.com/Blogs/Charles/STMNET-Who-What-Why
And here are some research papers published by Tim Harris and the old STM research blog:
http://research.microsoft.com/en-us/um/people/tharris/
http://blogs.msdn.com/b/stmteam/archive/2010/05/12/stm-net-d...
Fascinating stuff. I'm wary of having it be at the core of Python, but very interested to see where PyPy ends up.
[+] [-] arethuza|14 years ago|reply
http://en.wikipedia.org/wiki/Transactional_NTFS
[+] [-] wgren|14 years ago|reply
Nitpick: unless the field in questions is a "long" or "double". In those cases the operation becomes two bytecodes and you can in fact see something else entirely if you are unlucky. At least it used to be like that.
[+] [-] palish|14 years ago|reply
On 64-bit, you won't. =)
[+] [-] codedivine|14 years ago|reply
[+] [-] cabacon|14 years ago|reply
[+] [-] exDM69|14 years ago|reply
[+] [-] masklinn|14 years ago|reply
[+] [-] bfrog|14 years ago|reply
Thats all the rage these days anyways as its easier to reason about. Better still it matches distributed computing models much better.
[+] [-] alphaBetaGamma|14 years ago|reply
[+] [-] true_religion|14 years ago|reply
If they go the STM route and spend years hashing out the details only to be 2x as slow as CPython, then that cuts out a huge number of people who would like to practically use PyPy.
[+] [-] megaman821|14 years ago|reply
[+] [-] Niedar|14 years ago|reply
[+] [-] DasIch|14 years ago|reply
[+] [-] Hoff|14 years ago|reply
Message-passing, on the other hand, doesn't need particular locking and crashes are (somewhat) simpler to deal with.
(And most any scheme I've encountered can deadlock, and will usually choose the most inopportune moment to demonstrate this capability.)
There's an older blog post here http://enfranchisedmind.com/blog/posts/thoughts-on-paralleli... that's worth a read.
[+] [-] aidenn0|14 years ago|reply
I don't know the details of the pypy interpreter, but it does seem that given a byte-code that affects several objects, you could just acquire the locks for the objects as ordered by the objects address.
Now the pypy guys are very smart and this is a simple solution, so I'm sure it yields horrible performance, but the fact of the matter is it's not true that "being deadlock-free is a global program property that cannot be automatically ensured" (it is true that it can't be verified, but if you generate all the locking code it can be ensured by strict ordering) and to say otherwise is wrong.
[+] [-] scott_s|14 years ago|reply
That is, he's talking about situations where they don't have full knowledge when entering a critical section which objects will be accessed. Hence, they can't order lock acquisition in such a way to avoid deadlock.
[+] [-] jamwt|14 years ago|reply
[+] [-] chalst|14 years ago|reply
[+] [-] masklinn|14 years ago|reply
Pypy using STM should not change the semantics of the Python language or require alterations to code which does not rely on implementation details (code which relies on the GIL for its thread-safety may have to be fixed, it already has to be fixed for Jython compatibility).
> Isn't this harder in Python than in Haskell or Ocaml?
I'd expect STM to be much harder in imperative languages than in functional languages indeed. In fact, I don't think I've heard of a successful STM implementation in imperative languages yet.
On the other hand, I think most tentatives so far have been with STM as a library applied explicitly, and you run into issues of (mutable) objects "crossing" the transaction barrier in imperative languages, issue which "does not" happen in Haskell (there are ways to subvert the type system, but then you're on your own. In normal uses, I think objects under STM live in a different monad, and therefore can't be freely mixed with objects living outside of it) (I believe this issue would happen in OCaml, so on that front OCaml would be little better than imperative languages). This issue should not exist if all of the code runs under STM (ignoring explicit hooks to do so built into the system), and this should allow for "easy" implementation of explicit memory transactions as well (one of STM's promises is composability, so you should be able to compose multiple bytecode-transactions in a single explicit transaction).
[+] [-] surrealize|14 years ago|reply
http://www.bluebytesoftware.com/blog/2010/01/03/ABriefRetros...
[+] [-] albertzeyer|14 years ago|reply
[+] [-] albertzeyer|14 years ago|reply
Wouldn't it be enough to have a mutex for each object and when such a builtin function is called, all objects which are passed as parameters (including and esp. `self`) are locked?
[+] [-] exDM69|14 years ago|reply
Not to mention that if every object has a mutex associated, the performance will be horrible. Even if the mutexes are not contended, there will be so many atomic compare and swap operations that the interprocessor memory bus will simply die.
If your solution would work, every method in Java would be synchronized by default.
[+] [-] scott_s|14 years ago|reply
[+] [-] lobo_tuerto|14 years ago|reply
[+] [-] unknown|14 years ago|reply
[deleted]
[+] [-] masklinn|14 years ago|reply