top | item 25300233

Project Loom and Structured Concurrency

171 points| ingve | 5 years ago |javaadvent.com

108 comments

order
[+] ljackman|5 years ago|reply
`Async`/`await` or something like Kotlin's `suspend` are great language features for certain domains in which a developer needs to manage blocking system calls: in lower-level languages such as Rust or C, you probably don't want to pay for a lightweight "task runtime" Like Go's or Erlang's. They bring not only a scheduling overhead but also FFI complications.

However, for application languages that can afford a few extra nicities like garbage collection, I fail to understand why the stackless coroutine model (`suspend` in Kotlin) or `async`/`await` continue to be the developer's choice. Why do languages like Kotlin adopt these features, specifically?

Manually deciding where to yield in order to avoid blocking a kernel thread seems outside of the domain of problems that those using a _higher level_ language want to solve, surely?

The caller should decide whether to do something "in the background". And this applies to non-IO capabilities too, as sometimes pure computations are also expensive enough to warrant not blocking the current task.

Go and Erlang seem to have nailed this, so I'm glad Java is following in their footsteps rather than the more questionnable strategy of C# and Kotlin. (Lua's coroutines and Scheme's `call-with-current-continuation` deserve an honourable mention too.)

[+] geodel|5 years ago|reply
Kotlin runs on JVM so if JVM does not support something natively Kotlin can't have that feature like task runtime.
[+] jnwatson|5 years ago|reply
I have a lot of experience using concurrency in Go, and for the last couple years have been at the bleeding edge of Python async. The tradeoffs between the two approaches are immense.

With the virtual thread model you have:

* No function coloring problem. This also means existing code is easier to port.

* possibility of transparent M:N scheduling.

* Impedence mismatch with OS primitives.

* Much more sophisticated runtime.

* Problematic task cancellation.

* Lots of care still needed for non-trivial inter-task synchronization.

With the async API model you have:

* Viral asyncification (the method color problem).

* Simpler runtime.

* Obvious and safe task cancellation.

* Completely orthogonal to parallelism (actually doing more than one thing simultaneously) for good and for bad.

* Inter-task coordination is straightforward and low-overhead even for sophisticated use cases.

* Higher initial learning curve.

I'm leaning toward liking the async approach more, but that might be just because I'm deep in the middle of it. I think the biggest argument in favor of virtual threads is the automatic parallelism; that's also the biggest argument against: free running threads require more expensive synchronization and introduce nondeterminism.

[+] pron|5 years ago|reply
* Java offers both user-mode and kernel threads. You pick at creation, and can even plug your own scheduler.

* Loom's virtual threads are completely scheduled in library code, written in Java.

* FFI that bypasses the JDK and interacts with native code that does either IO or OS-thread synchronization is extremely rare in Java.

* Cancellation is the same for both.

Also, IMO, coordination is simpler for threads than for async. Where they differ is in their choice of defaults: thread allow scheduling points anywhere except where explicitly excluded; async/await allows scheduling points nowhere except where explicitly allowed. Putting aside that some languages have both, resulting in few if any guarantees, threads' defaults are better for correct concurrency. The reason is that correctness relies on atomicity, or lack of scheduling points in critical sections. When you explicitly exclude them, none of your callees can break your correctness. When you explicitly allow them, any callee can become async and break its caller's logic. True, the type system will show you where the relevant callsites are, but it will not show you whether there is a reliance on atomicity or not.

Async/await does, however, make sense for JavaScript, where all existing code already has an implicit assumption of atomicity, so breaking it would have broken the world. For languages that have both, async/await mostly adds a lot of complexity, although sometimes it is needed for implementation reasons.

[+] jerf|5 years ago|reply
I don't think "task cancellation" is quite the major difference you think. If you model it as thread A wants to cancel thread B, then while threading means that A runs and cancels B, but B may need some time to catch up, the async world has the problem of thread A running at all to cancel B, if B is having a problem that requires cancellation. It's "obvious" and "safe" until it doesn't happen at all.

This is a pervasive problem with the async/await model. As it scales up the probability of something failing to yield when it should and blocking everything else continually goes up as the code size increases, and then the whole model, correctness, practicality, and all, just goes out the window. While it is small for small programs, and it the scaling factor often isn't that large, it is still a thing that happens. Entire OSes used to work that way, with the OS and user processes cooperatively yielding, and what killed that model is this problem.

Also, I'm writing a lot of code lately where I can peg multiple cores at a time, with a relatively (if not maximally efficient) language like Go; having to also write it as a whole bunch of OS processes separately running because my runtime can only run on one core at a time is a non-starter, and "async/await" basically turns into a threading system if you try to run it on multiple cores in one system anyhow.

These two fatal-for-me flaws mean it's a non-starter for a lot of the work I'm doing anyhow, regardless of any other putative advantages.

(As I mentioned, I'm using Go, but if you want to see a runtime that really has the asynchronous exceptions thing figured out, go look at Erlang. Having a thread run off into never-never-land and eating a full CPU isn't fun, but being able to log in to your running system, figure out which it is using a REPL, kill just that thread, and reload its code before restarting it to fix the problem, all without taking down the rest of your system is not an experience most of you have had. But it can be done!)

[+] throwaway894345|5 years ago|reply
I've had similar experiences, but I don't much care for async Python. In particular, it's way too easy to block the event loop either by accidentally calling some function that, perhaps transitively, does blocking I/O (this could be remedied if there was no sync IO) or simply by calling a function which is unexpectedly CPU-bound. And when this happens, other requests start failing unrelated to the request that is causing the problem, so you go on this wild goose chase to debug. Sync I/O is also a much nicer, more ergonomic interface than async IMO. And then there are the type error problems--it's way too easy to forget to `await` something. Mypy could help with this, but it's still very, very immature. Lastly, last I checked the debugger couldn't cope with async syntax--this is obviously not criticizing the async approach in general, but I wanted to round out my complaining about async Python.

I don't mind working with goroutines personally--I use them sparingly, only when I really need concurrency or parallelism. This takes some discipline (e.g., not to go crazy with goroutines and/or channels) and a bit of experience (in the presence of multiple goroutines, what needs to be locked, when to use channels, etc), so if you're relatively new and very impatient or undisciplined you probably won't have a good time (which isn't to say that if you dislike goroutines you must be a novice or undisciplined!). But for me it's nearly an ideal experience.

[+] breatheoften|5 years ago|reply
I'm not sure I really think of function coloring as a "problem" ...

facebook is experimenting with auto differentiation for Kotlin and looks like it's adding a new "differentiable" function color -- https://ai.facebook.com/blog/paving-the-way-for-software-20-...

It looks very easy to reason about and use to me ... and i personally find async a similarly useful marker ... It's about being able to push constraints from caller arbitrarily far down the callee stack -- which is really not something that types support at all but provides for a very high confidence variety of constraint -- and high confidence constraints seem to me like they convey a ton of information.

I've been wondering actually whether "function colors" might actually just be a good way to create a whole variety of strong statically enforceable constraints for functions. It seems like they lead to very good and simple programmer mental models ...

Are there languages that offer "user definable" function colors? I can think of a lot of application domains that would be much better served by these kinds of constraints than oo or other type-centric approaches ... it would be ridiculously useful to be able to mark a function with the "MyDomainBusinessLogic" color and get assurances that such a method can only call other functions annotated with that color ... would provide an easy way to iterate on app specific abstractions provide compiler assistance for the communication of layering intent -- rather than a bunch of poorly specified words in documents that try to communicate layering intent to other developers -- in language that is either sufficiently precise as to be incomprehensible -- or sufficiently vague as to be subject to (mis)interpretation ...

[+] pron|5 years ago|reply
> It does nothing for you if you have computationally intensive tasks and want to keep all processor cores busy.

I would argue this isn't concurrency at all (the job of juggling mostly independent tasks, and scheduling them to a relatively small number of processing units), but parallelism (the job of performing a single computational task faster by employing multiple processing units), and exactly the job of parallel streams.

> It doesn’t help you with user interfaces that use a single event thread.

It might. Loom allows you to plug in your own scheduler, and it is a one-liner to schedule virtual threads on the UI thread:

    var uiVirtualThreadFactory = Thread.builder().virtual(java.awt.EventQueue::invokeLater).factory();
All threads created by this factory will be virtual threads that are always "carried" by the UI OS thread. This won't currently work because each of those threads will have its own identity, and various tests in the UI code check that the current thread is actually the UI thread and not any thread that is mapped to the same OS thread. Changing these tests is something the UI team is looking into.
[+] jayd16|5 years ago|reply
>All threads created by this factory will be virtual threads that are always "carried" by the UI OS thread

Does this actually solve the problem? I don't see it. We want to interweave foreground and background work. Sometimes that means blocking work will yield, sometimes that means it should not yield because conceptually several tasks should retain exclusive control of that thread. You might want some IO task on the background but you need a block of OpenGL tasks to retain control.

I just don't see how you can do this implicitly in a way that's cleaner than async/await. It seems like posting tasks to this thread factory or that will get the job done but is that an improvement?

It sounds like for now this stuff will still be using the current model of posting unyielding runnables to a thread. That's fine I guess. Loom still seems very cool, it just doesn't cover the cases I deal with a lot more often.

[+] mping|5 years ago|reply
For me the real advantage is not on performance but on the programming model. I have been tinkering with Loom (and clojure) and the idea of "just" calling some library without worrying about blocking is refreshing. That means that for the most of it, you can write your code without worrying too much about some kind of callbacks or async support from your library and it just works.

Of course, for those with extreme performance requirements, they will probably have their own custom scheduler and concurrency/parallelism mechanisms but for the vast majority of jvm users out there I think Loom will be a great thing. If Loom integrates with GraalVM/native-image it would be even nicer.

[+] lackbeard|5 years ago|reply
I think the vast majority of JVM users won't even need Loom. OS threads perform well enough for most use cases. You can go a very long way with just a ThreadPoolExecutor.
[+] kasperni|5 years ago|reply
For those that care about numbers: Loom targets ~ 200b memory overhead per virtual thread. And ~ 100ns per context switch between virtual threads.
[+] nfoz|5 years ago|reply
The article seems to assume you know what Project Loom is. (Not to be confused with Google's Project Loon, the balloon thing.)

From https://wiki.openjdk.java.net/display/loom/Main, it's an OpenJDK project:

> Project Loom is to intended to explore, incubate and deliver Java VM features and APIs built on top of them for the purpose of supporting easy-to-use, high-throughput lightweight concurrency and new programming models on the Java platform.

A bit more history/explanation here:

http://cr.openjdk.java.net/~rpressler/loom/loom/sol1_part1.h...

[+] pbourke|5 years ago|reply
To summarize: up till now, Java Threads have been 1:1 with OS threads. They’re limited to a few thousand per JVM. This project moves to an M:N threading model but retains the Thread API. It allows for millions of threads per JVM and async/await style performance of the existing synchronous Java libraries without language changes and with minimal changes to the standard library.
[+] technologia|5 years ago|reply
Thank you, I totally clicked on this thinking how could Google Loon have anything to do with this
[+] anonymousDan|5 years ago|reply
I don't quite get the point of the executor service with virtual threads. If they are really cheap to create then why not just create them as required? It's been a while since I programmed in Java though, am I missing something? Edit: Ah - I read the rest of the article. Using it as a synchronisation primitive makes sense I guess, if a bit clunky.
[+] kasperni|5 years ago|reply
In the examples with structured concurrency. The point of using an executor service is not to reuse the threads. But instead to control their termination. If you read Nathaniel J. Smith's primer [1] on structured concurrency. The ExecutorService in the examples act as the nursery. Loom is just being "lazy" and reusing ExecutorService for something that it wasn't originally intended to do. Earlier versions had a specific class called FiberScope [2]. Whether or not we will see more specialised classes for this in the future I don't know.

[1] https://vorpus.org/blog/notes-on-structured-concurrency-or-g...

[2] https://www.javaadvent.com/2019/12/project-loom.html

[+] jnwatson|5 years ago|reply
It is an unfortunate abuse of an existing facility for something quite different.
[+] nsaje|5 years ago|reply
If I understand correctly this is to Java what Gevent/Eventlet are to Python. Asynchronous execution without having to write asynchronous code.

I've been doing this in Python for a long time and I love it, even though it can lead to some hard-to-debug issues (due to monkeypatching).

[+] Traubenfuchs|5 years ago|reply
I am still not comfortable enough with this concept to answer this question myself, but will this, by default, lead to speed ups and/or reduced resource consumption in a) application server like tomcat and b) web frameworks like Spring? Assuming it's implemented...
[+] aardvark179|5 years ago|reply
It depends a lot on how the service handles requests. If it takes the one thread per request model, and those requests are mostly bound by blocking calls like IO, then replacing those OS level threads with virtual threads will almost certainly see a reduction in resources (as virtual threads are smaller) and potentially more consistent response times (because scheduling the correct thread is easier at the JVM level).

However if your service has been written in an async style, or you are mostly CPU bound, then you aren't likely to see a change.

Our hope is that by making simple blocking code perform better you won't have to spend your time converting code to an async style to scale your services.

[+] amitport|5 years ago|reply
Probably no, but that's not the point. These server frameworks have complex code that helps balance load across threads.

Loom may make such programs simpler to write, but will not automatically give a boost to already optimized code.

[+] latchkey|5 years ago|reply
Programming challenge in golang: Create a persistent tcp client that can connect to a server, read responses… and be disconnected via context.WithCancel().
[+] Cojen|5 years ago|reply
I'm not familiar enough with Go to understand why this is a challenge. What point are you trying to illustrate with the challenge? Is this easy or hard, and how does this compare to Loom and Structured Concurrency?
[+] anonymousDan|5 years ago|reply
Sounds like a useful building block for a better java-based actor library.
[+] jillesvangurp|5 years ago|reply
Since the article goes out of its way to not mention Kotlin, I'll do it for them since this is both lame and more than a bit disingenuous. Arguably, Kotlin co-routines (and the Flow API) provides a very nice implementation of the exact same concepts on the JVM. As far as I know, the loom integration is already planned and probably implemented to a large degree. Mostly doing that should be straightforward as this pretty much just maps 1 to 1 to things like suspend functions, co-routine scopes, etc.

That is a different way of saying that Oracle is doing the right things with Loom. Although bolting this onto the Thread API without cleaning that up is probably an open invitation for hordes of people to do the wrong things. That API already provides plenty of well documented ways to take shots at your feet. IMHO it's a mistake to pretend it's all the same.

The main difference with Kotlin co-routines is that the Kotlin implementation is multiplatform and also has implementations that work on IOS (native), in a browser etc. Additionally, you get to depend on nice language features like internal DSL support, the suspend keyword, etc, that make writing code a lot less tedious and error prone. But it's the same kind of code with the same kind of concepts. Finally, it also has lots of higher level primitives. Flow is a recent addition that allows for building properly reactive applications that sits on top of this.

So, to answer the obvious question will this replace/deprecate co-routines: no, this will have little to no impact as it will be trivial to support the low level primitives Loom provides just like they already work seamlessly with other implementations like rxjava, spring reactor, javascript promises, etc. They'll support it because it probably provides some performance benefits to use Loom if it's available on the platform but it should not impact how you use co-routines. The same co-routine code you write today will just work on top of Loom once that is available and implemented.

[+] ragnese|5 years ago|reply
You're getting downvoted because of your snarky opening statement.

But I do think it's important/relevant to compare virtual threads to Kotlin coroutines.

I agree with your point that tacking all of this onto the existing (flawed) Thread API is a risky move. I understand the reasoning on both sides, but I'm not usually a huge "backwards compatible at all costs" or "don't make people learn new things" proponent on anything. So that's my bias.

I think you're painting the `suspend` keyword a bit rosy, though. The fact that Kotlin has colored functions is a huge pain in the ass. You have to design different APIs sometimes to account for a "suspend version" and a "non-suspend version".

The idea with Loom (like goroutines, which is the first green thread model I've used) is that async stuff is so cheap that you can almost pretend it doesn't even matter if something calls a coroutine. I'm not sure if that's the best solution, though. One advantage that colored functions do have is that you see it and "know" that the thing involves expensive and/or blocking work. With coroutines, how do you know if calling a function will slow your current thread down as it waits for the results? That's a question we could ask Go devs today, I suppose.

I agree with your prediction that Kotlin's coroutines might just sit on top of virtual threads on the JVM in the future.

[+] pron|5 years ago|reply
It's like saying the article goes out of its way not to mention Scala/Haskell's IO type. Syntactic coroutines, monadic IO, and threads are different constructs, although they are different ways to address a similar problem -- expressing sequential (and, in contrast, parallel) composition. Virtual threads are Java threads; there's nothing "bolted". Syntactic coroutines are a kind of syntactic code-unit similar to subroutines.

Which one you prefer as a coding style is a matter of taste, but threads have some objective advantages over syntactic coroutines that go beyond syntax. For one, they don't require a split API (C# and Kotlin have two copies of their synchronization and IO APIs that do the same thing but are intended for different kinds of units, subroutines or coroutines); for another, they seamlessly integrate with the platform and its tooling, allowing use of exceptions, debuggers and profilers with little or no change to those tools. The Java platform -- the standard library, the VM, and its built-in profiling and debugging mechanisms -- has been designed around threads.

BTW, Java's strategy for targeting platforms like iOS and, later, the browser, is through AOT compilation of Java bytecode using things like Native Image (e.g. https://gluonhq.com/java-on-ios-for-real/). This allows you to employ the standard library as well. Kotlin's approach is different, and requires different libraries when targeting different platforms.

[+] vbezhenar|5 years ago|reply
As I see it, the main difference is that existing code will magically work with Loom and will require rewrite with coroutines. I can get Oracle 9i JDBC driver from 1999 and use it under Loom and it probably will work just fine. Probably Oracle will not rewrite its JDBC drivers with Kotlin coroutines any time soon.
[+] ceronman|5 years ago|reply
There is an important difference with Kotlin coroutines. In Kotlin you still have the problem of colored functions [1], those marked with `suspend` and the regular functions. You can't call suspend functions from regular functions and the world is divided in blocking and non blocking APIs (e.g Thread.sleep() vs delay()). And then you have to use things like `runBlocking` to bridge these two worlds.

If I understand correctly, Loom breaks that wall completely. You don't need to mark functions as `suspend`, the runtime is just smart enough to do the right thing. For example, if you call Thread.sleep() on a regular OS thread, then that will block, but if you run it on a light weight thread, it will suspend instead, allowing the runtime to use the OS thread for another task.

And there is one more thing: because Loom is implemented at the VM level, that means that when using Lightweight threads you get all the good things you typically get with regular threads: Proper stacktraces and native debugging and profiling support.

[1] https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...

[+] kasperni|5 years ago|reply
> Since the article goes out of its way to not mention Kotlin,

"Since the article goes out of its way to not mention [Rust|Kotlin]" and "I'm surprised this article has no mention of [Rust|Kotlin]" must be one of the most frequently used templates on HN.

[+] valenterry|5 years ago|reply
What makes you think that Kotlin should be mentioned? Especially since concurrency in Kotlin is really not great, compared to concurrency in Erlang or Scala.