(no title)
ye-olde-sysrq | 2 years ago
Less-joking: I'm so excited for this to start getting adoption. For a pre-release/preview feature, Loom already has SO much traction, with most major frameworks already having support for running app/user code on virtual threads and one notable project (Helidon Nima) replacing netty with their own solution based on blocking IO and virtual threads. Now I want to see the community run with it.
I've always thought async IO was just plain gross.
Python's implementation is so yucky that after using it for one project I decided that I'd rather DIY it with multiprocessing than use async again. (I don't have any more constructive feedback than that, my apologies, it was a while ago so I don't remember the specifics but what has lasted is the sour taste of the sum of all the problems we had with it - perhaps notably that only about 2 people on my dev team of 5 actually understood the async paradigm).
netty did it fine. I've built multiple projects on top of netty and it's fine. I like event-based-async more than async-await-based-async. But it's still a headache and notably I really rather missed the kinds of guarantees you can get (in blocking code) by wrapping the block in try-catch-finally (to e.g. guarantee resources get freed or that two counters, say a requests-in and a requests-out, are guaranteed to match up).
But dang am I excited to not do that anymore. I have one specific project that I'm planning to port from async to blocking+virtualthreads that I expect to greatly simplify the code. It has a lot of requests it makes back and forth (it has to manually resolve DNS queries among other things) so there's good chunks of 50-200 ms where I have to either yield (and has gross async code that yields and resumes all the heck over the place) or block the thread for human-noticeable chunks of time (also very gross of course!).
PaulHoule|2 years ago
Whereas my Java projects live on long after I am gone from the project.
Personally I love aiohttp web servers, particularly when using web sockets and brokering events from message queues and stuff like that. Not to mention doing cool stuff with coroutines and hacking the event queue (like what do you do if your GUI framework also has an event queue?) If YOShInOn (my smart RSS reader + intelligent agent) were going to become open source though I might just need to switch to Flask which would be less fun.
bad_user|2 years ago
Python always had deployment issues, IMO. In Java, 99% of all library dependencies are pure JARs, and you rarely need to depend on native libraries. You can also assemble an executable fat JAR which will work everywhere, and the fact that the build tools are better (e.g., Maven, Gradle) helps.
Compared with Python, for which even accessing a RDBMS was an exercise in frustration, requiring installing the right blobs and library headers via the OS's package manager, with Postgres being particularly painful. NOTE: I haven't deployed anything serious built with Python in a while, maybe things are better now, but it couldn't get much better, IMO.
eastbound|2 years ago
geodel|2 years ago
davewritescode|2 years ago
smallerfish|2 years ago
atomicnumber3|2 years ago
If you write a program using blocking IO and Platform (OS) threads, you're essentially limited to a couple hundred concurrent tasks, or however many threads your particular linux kernel + hardware setup can context switch between before latency starts suffering. So it's slow not because Java is slow, but because kernel threads are heavyweight and you can't just make a trillion of them just for them to be blocking waiting on IO.
If you use async approaches, your programming model suffers, but now you're multiplexing millions of tasks over a small number of platform threads of execution still without even straining the kernel's switching. You've essentially moved from kernel scheduling to user-mode scheduling by writing async code.
Virtual threads is a response to this, saying "what if you can eat your cake and have it, too?" by "simply" providing a user-mode-scheduled thread implementation. They took the general strategy that async programming was employing and "hoisted" it up a couple levels of abstraction, to be "behind" the threading model. Now you have all the benefits of just blocking the thread without all the problems that come from trying to have a ton of platform threads that will choke the linux kernel out.
mr_tristan|2 years ago
A few years ago, I wrote a load generation application using Kotlin’s coroutines - in this case, each coroutine would be a “device”. And I could add interesting modeling on each device; I easily ran 250k simulated devices within a single process, and it took me a couple of days. But coroutines are not totally simple; any method that might call IO needs to be made “coroutine aware”. So the abstraction kinda leaks all over the place.
Now, you can do the same thing in Java. Just simply model each device as its own Runnable and poof, you can spin up a million of them. And there isn’t much existing code that has to be rewritten. Pretty slick.
So this isn’t really a “high performance computing” feature, but a “blue collar coder” thing.
kitd|2 years ago
V21 virtual threads are more like Go's goroutines. They map 1:m with OS threads, and the JVM is responsible for scheduling them, making them much less of a burden on the underlying OS, with fewer context switches, etc. And the best thing is, there has been minimal change in the Java standard library API, making them very accessible to existing devs and their codebases.
jabradoodle|2 years ago
hn_throwaway_99|2 years ago
First, it's best to understand the benefit of virtual threads from a webserver. Usually, a webserver maps 1 request to 1 thread. However, most of the time the webserver actually doesn't run much code itself: it calls out to make DB requests, pulls files from disk, makes remote API requests, etc. With blocking IO, when a thread makes one of these remote calls, it just sits there and waits for the remote call to return. In the meantime, it holds on to a bunch of resources (e.g. memory) while it's sitting doing nothing. For something like HFT, that's normally not much of a problem because the goal isn't to server tons of independent incoming requests (sometimes, obviously the usage pattern can differ), but for a webserver, it can have a huge limiting effect on the number of concurrent requests that can be processed, hurting scalability.
Compare that to how NodeJS processes incoming web requests. With Node (and JS in general), there is just a single thread that processes incoming requests. However, with async IO in Node (which is really just syntactic sugar around promises and generators), when a request calls out to something like a DB, it doesn't block. Instead, the thread is then free to handle another incoming web request. When the original DB request returns, the underlying engine in Node essentially starts up that request from where it left off (if you want more info just search for "Node event loop"). Folks found that in real world scenarios that Node can actually scale extremely well to the number of incoming request, because lots of webserver code is essentially waiting around for remote IO requests to complete.
However, there are a couple of downsides to the async IO approach:
1. In Node, the main event loop is single threaded. So if you want to do some work that is heavily CPU intensive, until you make an IO call, the Node server isn't free to handle another incoming request. You can test this out with a busy wait loop in a Node request handler. If you have that loop run for, say, 10 seconds, then no other incoming requests can be dispatched for 10 seconds. In other words, Node doesn't allow for preemptive interruption.
2. While I generally like the async IO style of programming and I find it easy to reason about, some folks don't like it. In particular, it creates a "function coloring" problem: https://journal.stuffwithstuff.com/2015/02/01/what-color-is-... . Async functions can basically only be called from other async functions if you want to do something with the return value.
Virtual threads then basically can provide the best features from both of these approaches:
1. From a programming perspective, it "feels" pretty much like you're just writing normal, blocking IO code. However, under the covers, when you make a remote call, the Java schedule will reuse that thread to do other useful work while the remote call is executing. Thus, you get greatly increased scalability for this type of code.
2. You don't have to worry about the function coloring problem. A "synchronous" function can call out to a remote function, and it doesn't need to change anything about its own function signature.
3. Virtual threads can be preemptively interrupted by the underlying scheduler, preventing a misbehaving piece of code from starving resources (I'm actually less sure of the details on this piece for Java).
Hope that helps!
zerr|2 years ago
pjmlp|2 years ago
Additionally since virtual threads are exposed across the whole runtime and standard library, not only they are built on top of native threads (red), the developers have control over their scheduling.
kaba0|2 years ago
Virtual threads make blocking IO calls automagically non-blocking, allowing for better utilization of the CPU.
karg_kult|2 years ago
bad_user|2 years ago
The TLDR is that it needs “function coloring” which isn't necessarily bad, types themselves are “colors”, the problem being what you're trying to accomplish. In an FP language, it's good to have functions that are marked with an IO context because there the issue is the management of side effects. OTOH, the differences between blocking and non-blocking functions is: (1) irrelevant if you're going to `await` on those non-blocking functions or (2) error-prone if you use those non-blocking functions without `await`. Kotlin's syntax for coroutines, for example, doesn't require `await`, as all calls are (semantically) blocking by default. We should need extra effort to execute things asynchronously.
One issue with “function coloring” is that when a function changes its color, all downstream consumers have to change color too. This is actually useful when you're tracking side effects, but rather a burden when you're just tracking non-blocking code. To make matters worse, for side-effectful (void) functions, the compiler won't even warn you that the calls are now “fire and forget” instead of blocking, so refactorings are error-prone.
In other words, .NET does function coloring for the wrong reasons and the `await` syntax is bad.
Furthermore, .NET doesn't have a usable interruption model. Java's interruption model is error-prone, but it's more usable than that. This means that the “structured concurrency” paradigm can be implemented in Java, much like how it was implemented in Kotlin (currently in preview).
PS: the .NET devs actually did an experiment with virtual threads. Here are their conclusions (TLDR virtual threads are nice, but they won't add virtual threads due to async/await being too established):
https://github.com/dotnet/runtimelab/issues/2398
5e92cb50239222b|2 years ago
sebazzz|2 years ago
jayd16|2 years ago