top | item 40959140

Java Virtual Threads: A Case Study

167 points| mighty_plant | 1 year ago |infoq.com

189 comments

order
[+] pron|1 year ago|reply
Virtual threads do one thing: they allow creating lots of threads. This helps throughput due to Little's law [1]. But because this server here saturates the CPU with only a few threads (it doesn't do the fanout modern servers tend to do), this means that no significant improvements can be provided by virtual threads (or asynchronous programming, which operates on the same principle) while keeping everything else in the system the same, especially since everything else in that server was optimised for over two decades under the constraints of expensive threads (such as the deployment strategy to many small instances with little CPU).

So it looks like their goal was: try adopting a new technology without changing any of the aspects designed for an old technology and optimised around it.

[1]: https://youtu.be/07V08SB1l8c

[+] stelfer|1 year ago|reply
It goes deeper than Little's Law. Every decent textbook on introductory queuing theory has the result that on a normalized basis, fast server > multi-server > multi-queue. That analysis admits almost arbitrary levels of depth of analysis and still holds true.

Your observation that computing architectures have chased fast server for decades is apt. There's a truism in computing that those who build systems are doomed to relearn the lessons of the early ages of networks, whether they studied them in school or not. But kudos to whoever went through the exercise again.

[+] jayceedenton|1 year ago|reply
I guess at least their work has confirmed what we probably already knew intuitively: if you have CPU-intensive tasks, without waiting on anything, and you want to execute these concurrently, use traditional threads.

The advice "don't use virtual threads for that, it will be inefficient" really does need some evidence.

Mildly infuriating though that people may read this and think that somehow the JVM has problems in its virtual thread implementation. I admit their 'Unexpected findings' section is very useful work, but the moral of this story is: don't use virtual threads for this that they were not intended for. Use them when you want a very large number of processes executing concurrently, those processes have idle stages, and you want a simpler model to program with than other kinds of async.

[+] hitekker|1 year ago|reply
This take sounds reasonable to me. But I'm not an expert, and I'd be curious to hear an opposing view if there's one.
[+] cayhorstmann|1 year ago|reply
I looked at the replication instructions at https://github.com/blueperf/demo-vt-issues/tree/main, which reference this project: https://github.com/blueperf/acmeair-authservice-java/tree/ma...

What "CPU-intensive apps" did they test with? Surely not acmeair-authservice-java. A request does next to nothing. It authenticates a user and generates a token. I thought it at least connects to some auth provider, but if I understand it correctly, it just uses a test config with a single test user (https://openliberty.io/docs/latest/reference/config/quickSta...). Which would not be a blocking call.

If the request tasks don't block, this is not an interesting benchmark. Using virtual threads for non-blocking tasks is not useful.

So, let's hope that some of the tests were with tasks that block. The authors describe that a modest number of concurrent requests (< 10K) didn't show the increase in throughput that virtual threads promise. That's not a lot of concurrent requests, but one would expect an improvement in throughput once the number of concurrent requests exceeds the pool size. Except that may be hard to see because OpenLiberty's default is to keep spawning new threads (https://openliberty.io/blog/2019/04/03/liberty-threadpool-au...). I would imagine that in actual deployments with high concurrency, the pool size will be limited, to prevent the app from running out of memory.

If it never gets to the point where the number of concurrent requests significantly exceeds the pool size, this is not an interesting benchmark either.

[+] pansa2|1 year ago|reply
Are these Virtual Threads the feature that was previously known as “Project Loom”? Lightweight threads, more-or-less equivalent to Go’s goroutines?
[+] giamma|1 year ago|reply
Yes, at EclipseCon 2022 an Oracle manager working on the Helidon framework presented their results replacing the Helidon core, which was based on Netty (and reactive programming) with Virtual Threads (using imperative programming). [1].

Unfortunately the slides from that presentation were not uploaded to the conference site, but this article summarizes [2] the most significant metrics. The Oracle guy claimed that by using Virtual Threads Oracle was able to implement, using imperative Java, a new engine for Helidon (called Nima) that had identical performance to the old engine based on Netty, which is (at least in Oracle's opinion) the top performing reactive HTTP engine.

The conclusion of the presentation was that based on Oracle's experience imperative code is much easier to write, read and maintain with respect to reactive code. Given the identical performance achieved with Virtual Threads, Oracle was going to abandon reactive programming in favor of imperative programming and virtual threads in all its products.

[1] https://www.eclipsecon.org/2022/sessions/helidon-nima-loom-b...

[2] https://medium.com/helidon/helidon-n%C3%ADma-helidon-on-virt...

[+] pgwhalen|1 year ago|reply
Yes. It's not that the feature was previously known under a different name - Project Loom is the OpenJDK project, and Virtual Threads are the main feature that has come out of that project.
[+] tomp|1 year ago|reply
They're not equivalent to Go's goroutines.

Go's goroutines are preemptive (and Go's development team went through a lot of pain to make them such).

Java's lightweight threads aren't.

Java's repeating the same mistakes that Go made (and learned from) 10 years ago.

[+] exabrial|1 year ago|reply
What is the virtual thread / event loop pattern seeking to optimize? Is it context switching?

A number of years ago I remember trying to have a sane discussion about “non blocking” and I remember saying “something” will block eventually no matter what… anything from the buffer being full on the NIC to your cpu being at anything less than 100%. Does it shake out to any real advantage?

[+] gregopet|1 year ago|reply
It's a brave attempt to release the programmer from worrying or even thinking about thread pools and blocking code. Java has gone all in - they even cancelled a non-blocking rewrite of their database driver architecture because why have that if you won't have to worry about blocking code? And the JVM really is a marvel of engineering, it's really really good at what it does, so what team to better pull this off?

So far, they're not quite there yet: the issue of "thread pinning" is something developers still have to be aware of. I hear the newest JVM version has removed a few more cases where it happens, but will we ever truly 100% not have to care about all that anymore?

I have to say things are already pretty awesome however. If you avoid the few thread pinning causes (and can avoid libraries that use them - although most of not all modern libraries have already adapted), you can write really clean code. We had to rewrite an old app that made a huge mess tracking a process where multiple event sources can act independently, and virtual threads seemed the perfect thing for it. Now our business logic looks more like a game loop and not the complicated mix of pollers, request handlers, intermediate state persisters (with their endless thirst for various mappers) and whatnot that it was before (granted, all those things weren't there just because of threading.. the previous version was really really shitily written).

It's true that virtual threads sometimes hurt performance (since their main benefit is cleaner simpler code). Not by much, usually, but a precisely written and carefully tuned piece of performance critical code can often still do things better than automatic threading code. And as a fun aside, some very popular libraries assumed the developer is using thread pools (before virtual threads, which non trivial Java app didn't? - ok nobody answer that, I'm sure there are cases :D) so these libraries had performance tricks (ab)using thread pool code specifics. So that's another possible performance issue with virtual threads - like always with performance of course: don't just assume, try it and measure! :P

[+] fzeindl|1 year ago|reply
Does it shake out to any real advantage?

To put it shortly: Writing single-threaded blocking code is far easier for most people and has many other benefits, like more understandable and readable programs: https://www.youtube.com/watch?v=449j7oKQVkc

The main reason why non-blocking IO with it's style of intertwining concurrency and algorithms came along is that starting a thread for every request was too expensive. With virtual threads that problem is eliminated so we can go back to writing blocking code.

[+] chipdart|1 year ago|reply
> What is the virtual thread / event loop pattern seeking to optimize? Is it context switching?

Throughput.

Some workloads are not CPU-bound or memory-bound, and spend the bulk of their time waiting for external processes to make data available.

If your workloads are expected to stay idle while waiting for external events, you can switch to other tasks while you wait for those external events to trigger.

This is particularly convenient if the other tasks you're hoping to run are also tasks that are bound to stay idle while waiting for external events.

One of the textbook scenarios that suits this pattern well is making HTTP requests. Another one is request handlers, such as the controller pattern used so often in HTTP servers.

Perhaps the poster child of this pattern is Node.js. It might not be the performance king and might be single-threaded, but it features in the top spots in performance benchmarks such as TechEmpower's. Node.js is also highly favoured in function-as-a-service applications, as it's event-driven architecture is well suited for applications involving a hefty dose of network calls running on memory- and CPU-constrained systems.

[+] kevingadd|1 year ago|reply
One of the main reasons to do virtual threads is that it allows you to write naive "thread per request" code and still scale up significantly without hitting the kind of scaling limits you would with OS threads.
[+] pron|1 year ago|reply
No, it optimises hardware utilisation by simply allowing more tasks to concurrently make progress. This allows throughput to reach the maximum the hardware allows. See https://youtu.be/07V08SB1l8c.
[+] duped|1 year ago|reply
imo the biggest difference between "virtual" threads in a managed runtime and "os" threads is that the latter uses a fixed size stack whereas the former is allowed to resize, it can grow on demand and shrink under pressure.

When you spawn an OS thread you are paying at worst the full cost of it, and at best the max depth seen so far in the program, and stack overflows can happen even if the program is written correctly. Whereas a virtual thread can grow the stack to be exactly the size it needs at any point, and when GC runs it can rewrite pointers to any data on the stack safely.

Virtual/green/user space threads aka stackful coroutines have proven to be an excellent tool for scaling concurrency in real programs, while threads and processes have always played catchup.

> “something” will block eventually no matter what…

The point is to allow everything else to make progress while that resource is busy.

---

At a broader scale, as a programming model it lets you architect programs that are designed to scale horizontally. With the commodization of compute in the cloud that means it's very easy to write a program that can be distributed as i/o demand increases. In principle, a "virtual" thread could be spawned on a different machine entirely.

[+] frevib|1 year ago|reply
They indeed optimize thread context switching. Taking the thread on and off the CPU is becoming expensive when there are thousands of threads.

You are right that everything blocks, even when going to L1 cache you have to wait 1 nanoseconds. But blocking in this context means waiting for “real” IO like a network request or spinning disk access. Virtual threads take away the problem that the thread sits there doing nothing for a while as it is waiting for data, before it is context switched.

Virtual threads won’t improve CPU-bound blocking. There the thread is actually occupying the CPU, so there is no problem of the thread doing nothing as with IO-bound blocking.

[+] kbolino|1 year ago|reply
The hardware now is just as concurrent/parallel as the software. High-end NVMe SSDs and server-grade NICs can do hundreds to thousands of things simultaneously. Even if one lane does get blocked, there are other lanes which are open.
[+] lmm|1 year ago|reply
> I remember saying “something” will block eventually no matter what… anything from the buffer being full on the NIC to your cpu being at anything less than 100%.

Nope. You can go async all the way down, right to the electrical signals if you want. We usually impose some amount of synchronous clocking/polling for sanity, at various levels, but you don't have to; the world is not synchronised, the fastest way to respond to a stimulus will always be to respond when it happens.

> Does it shake out to any real advantage?

Of course it does - did you miss the whole C10K discussions 20+ years ago? Whether it matters for your business is another question, but you can absolutely get a lot more throughput by being nonblocking, and if you're doing request-response across the Internet you generally can't afford not to.

[+] bberrry|1 year ago|reply
I don't understand these benchmarks at all. How could it possibly take virtual threads 40-50 seconds to reach maximum throughput when getting a number of tasks submitted at once?
[+] LinXitoW|1 year ago|reply
From my very limited exposure to virtual threads and the older solution (thread pools), the biggest hurdle was the extensive use of ThreadLocals by most popular libraries.

In one project I had to basically turn a reactive framework into a one thread per request framework, because passing around the MDC (a kv map of extra logging information) was a horrible pain. Getting it to actually jump ship from thread to thread AND deleting it at the correct time was basically impossible.

Has that improved yet?

[+] joshlemer|1 year ago|reply
I faced this issue once. I solved it by creating a wrapping/delegating Executor, which would capture the MDC from the scheduling thread at schedule-time, and then at execute-time, set the MDC for the executing thread, and then clear the MDC after the execution completes. Something like...

    class MyExecutor implements Executor {
        private final Executor delegate;
        public MyExecutor(Executor delegate) {
            this.delegate = delegate;
        }
        @Override
        public void execute(@NotNull Runnable command) {
            var mdc = MDC.getCopyOfContextMap();
            delegate.execute(() -> {
                MDC.setContextMap(mdc);
                try {
                    command.run();
                } finally {
                    MDC.clear();
                }
            });
        }
    }
[+] vbezhenar|1 year ago|reply
What do you mean by hurdle? ThreadLocals work just fine with virtual threads.
[+] bberrry|1 year ago|reply
If you are already in a reactive framework, why would you change to virtual threads? Those frameworks pool threads and have their own event loop so I would say they are not suitable for virtual thread migration.
[+] davidtos|1 year ago|reply
I did some similar testing a few days ago[1]. Comparing platform threads to virtual threads doing API calls. They mention the right conditions like having high task delays, but it also depends on what the task is. Threads.sleep(1) performs better on virtual threads than platform threads but a rest call taking a few ms performs worse.

[1] https://davidvlijmincx.com/posts/virtual-thread-performance-...

[+] taspeotis|1 year ago|reply
My rough understanding is that this is similar to async/await in .NET?

It’s a shame this article paints a neutral (or even negative) experience with virtual threads.

We rewrote a boring CRUD app that spent 99% of its time waiting the database to respond to be async/await from top-to-bottom. CPU and memory usage went way down on the web server because so many requests could be handled by far fewer threads.

[+] jsiepkes|1 year ago|reply
> My rough understanding is that this is similar to async/await in .NET?

Well somewhat but also not really. They are green threads like async/await, but it's use is more transparent, unlike async/await.

So there are no special "async methods". You just instantiate a "VirtualThread" where you normally instantiate a (kernel) "Thread" and then use it like any other (kernel) thread. This works because for example all blocking IO API will be automatically converted to non-blocking IO underwater.

[+] devjab|1 year ago|reply
> My rough understanding is that this is similar to async/await in .NET?

Not really. What C# does is sort of similar but it has the disadvantages of splitting your code ecosystem into non-blocking/blocking code. This means you can “accidentally” start your non-blocking code. Something which may cause your relatively simple API to consume a ridiculous amount of resources. It also makes it much more complicated to update and maintain your code as it grows over the years. What is perhaps worse is that C# lacks an interruption model.

Java’s approach is much more modern but then it kind of had to be because the JVM already supported structured concurrency from Kotlin. Which means that Java’s “async/await” had to work in a way which wouldn’t break what was already there. Because Java is like that.

I think you can sort of view it as another example of how Java has overtaken C# (for now), but I imagine C# will get an improved async/await model in the next couple of years. Neither approach is something you would actually chose if concurrency is important to what you build and you don’t have a legacy reason to continue to build on Java/C# . This is because Go or Erlang would be the obvious choice, but it’s nice that you at least have the option if your organisation is married to a specific language.

[+] kimi|1 year ago|reply
It's more like Erlang threads - they appear to be blocking, so existing code will work with zero changes. But you can create a gazillion of them.
[+] he0001|1 year ago|reply
> My rough understanding is that this is similar to async/await in .NET?

The biggest difference is that C# async/await code is rewritten by the compiler to be able to be async. This means that you see artifacts in the stack that weren’t there when you wrote the code.

There are no rewrites with virtual threads and the code is presented on the stack just as you write it.

They solve the same problem but in very different ways.

[+] fulafel|1 year ago|reply
Can you expand on how the benefit in your rewrite came about? Threads don't consume CPU when they're waiting for the DB, after all. And threads share memory with each other.

(I guess scaling to ridiculous levels you could be approaching trouble if you have O(100k) outstanding DB queries per application server, hope you have a DB that can handle millions of oustanding DB queries then!)

[+] xxs|1 year ago|reply
>My rough understanding is that this is similar to async/await in .NET?

No, the I/O is still blocking with respect to the application code.

[+] tzahifadida|1 year ago|reply
Similarly the power of golang concurrent programming is that you write non-blocking code as you write normal code. You don't have to wrap it in functions and pollute the code but moreover, not every coder on the planet knows how to handle blocking code properly and that is the main advantage. Most programming languages can do anything the other languages can do. The problem is that not all coders can make use of it. This is why I see languages like golang as an advantage.
[+] jillesvangurp|1 year ago|reply
Kotlin embraced the same thing via co-routines, which are conceptually similar to go routines. It adds a few useful concepts around this though; mainly that of a co-routine context which encapsulates that a tree of co-routine calls needs some notion of failure handling and cancellation. Additionally, co-routines are dispatched to a dispatcher. A dispatcher can be just on the same thread or actually use a thread pool. Or as of recent Java versions a virtual thread pool. There's actually very little point in using virtual threads in Kotlin. They are basically a slightly more heavy weight way of doing co-routines. The main benefit is dealing with legacy blocking Java libraries.

But the bottom line with virtual threads, go-routines, or kotlin's co-routines is that it indeed allows for imperative code style code that is easy to read and understand. Of course you still need to understand all the pitfalls of concurrency bugs and all the weird and wonderful way things can fail to work as you expect. And while Java's virtual threads are designed to work like magic pixie dust, it does have some nasty failure modes where a single virtual thread can end up blocking all your virtual threads. Having a lot of synchronized blocks in legacy code could cause that.

[+] juyjf_3|1 year ago|reply
Can we stop pretending Erlang does not exist?

Go is a next-gen trumpian language that rejects sum types, pattern matching, non-nil pointers, and for years, generics; it's unhinged.