Shenandoah in OpenJDK 17: Sub-millisecond GC pauses

[+] ceronman|4 years ago|reply

It took me some time working with a non-GC language to fully appreciate how amazing Garbage Collection is. It really makes a lot of things easier, simpler and specially safer. Yes, of course it has some drawbacks. Yes, of course, not every single kind of program can afford to have a GC. But hell, life is so much easier with a GC! I'm very happy to see that there is active research and development happening to make the drawbacks of GC as minimal as possible!

[+] mikepurvis|4 years ago|reply

I feel like the key is knowing that your problem domain is fine with there always being garbage collection in the loop. Otherwise there's the risk that down the road you'll have to start tracing the code, rewriting things to reuse objects, etc etc.

[+] Koshkin|4 years ago|reply

Also, because GC is “lazy,” your program may even end up running faster than one that uses an “eager” memory management, such as the one used in C++, where destructors are invoked, and objects on the heap are deallocated, early on, i.e. as soon as something goes out of scope, rather than when the process is close to running out of its memory quota (which may never happen).

[+] mnd999|4 years ago|reply

Everyone else: pauses are bad, we can’t have garbage collection.

Java: pauses are bad, let’s make the garbage collector better.

Much credit to the Java community for ignoring the noise and building something great that lets most applications have the best of both worlds.

[+] tpxl|4 years ago|reply

I was testing ZGC the other day on my machine (24vcores, 64GB RAM) and it ate through about 4GB/s of garbage with about 10milisec of pauses over 220sec of runtime (24 threads doing nothing but allocations). G1 ate through about 2x as much garbage but with higher pause times. It's amazing how far we've come.

I hope this means we can get more games in Java since I'd like to code one or two in my favorite language :)

[+] lmilcin|4 years ago|reply

It all depends on your application.

I worked on one which was required to respond to a packet from the network within 5 microseconds of the packet arriving.

Please, don't assume all applications are mobile uis, webapps and corporate backends.

[+] pjmlp|4 years ago|reply

Everyone else: You can only have systems programming in languages with manual memory management.

Java(JikesRVM)/Oberon/Go/D/Nim: Let rewrite the whole toolchain in X, including the memory manager itself.

[+] Semaphor|4 years ago|reply

.NET is also regularly improving their GC, aren’t they?

[+] voidnullnil|4 years ago|reply

It's more like this:

C diehards: Pauses are bad, we need manual memory management.

Everyone else: Does not have manual memory management.

[+] sgt|4 years ago|reply

Who is everyone else... Rust and Swift?

[+] throwaway894345|4 years ago|reply

Everyone else? Most languages have GC and Go has had a low latency GC for almost a decade IIRC.

[+] blub|4 years ago|reply

Java's been making the garbage collector better for decades at this point and is still in the second performance tier compared to those other languages.

Of course, if one does GC tuning like a pro, uses a sufficiently smart GC and maybe a custom VM (and unicorns) one may reach the mythical native speeds (typically at the cost of using a lot more memory).

[+] ncmncm|4 years ago|reply

Thanks, I still prefer exactly-zero ms pauses. Maybe that's just me, but I don't think so. And, I don't think I am giving anything up by skipping them.

[+] thinkharderdev|4 years ago|reply

Great article. The JVM is one of the most astonishing pieces of software ever created.

I was looking at gRPC benchmarks the other day (https://github.com/LesnyRumcajs/grpc_bench/wiki/2021-08-30-b...) and 6 of the top 7 performing gRPC implementations on 3 core CPUS were on the JVM.

[+] wesnerm2|4 years ago|reply

You shouldn't read too much from the benchmarks.

The gRPC libraries are primarily developed by Google. Google pours most of its development resources on its core languages: Java, Javascript, Go, C++, and Python. Google then allocates a developer to port the implementation to other languages and platforms. C++, unlike Java, is not typically used for cloud development, so I don't think a lot of resources were allocated for the language.

You can see how the difference in implementation effort can impact performance by comparing dotnet_grpc (Microsoft's fork) against csharp_grpc (Google's original implementation). There's more than a sevenfold improvement in req/s in Microsoft's implementation in the 1 CPU server case (35070 vs 5337), outperforming nearly all the Java benchmarks.

Also, many of those top performing JVM implementations are the same code running under a different garbage collector. .NET has two GCs each with a parallel option, but we only see one benchmark using likely the slower GC (Workstation GC instead of ServerGC).

[+] srazzaque|4 years ago|reply

Completely agree.

Say what you will about Java (the language), the JVM is a seriously good piece of kit.

[+] pjmlp|4 years ago|reply

And .NET as well, given the value types and low level improvemetns done since 7.0.

[+] emmericp|4 years ago|reply

We wrote some experimental network drivers in high-level languages a few years ago and Java performed better than expected. Yeah, that's a very special and somewhat odd use case, but it was fun :)

We've got some graphs [0] comparing throughput and long-tail latency for the various Java GCs available in OpenJDK 12. Shenandoah's worst case pause time was ~45µs which is the same as disabling the GC (Epsilon GC) which is pretty impressive. Overall performance did suffer under Shenandoah a little bit back then, though. However, I've heard that this improved recently.

[0] https://github.com/ixy-languages/ixy-languages/blob/master/J...

[+] jacquesm|4 years ago|reply

Java can be amazingly performant. One particular example is a video compressor handling many concurrent streams, recompressing them on the fly and shipping them out again saturating a gigabit link.

[+] enduku|4 years ago|reply

One neat feature of Shenandoah is once the application shuts down, it prints into the GC log a table depicting how much time each phase of the GC cycle took. This data can be crucial to optimize the GC pause time further:

- Removal of weak references (finalizer, phantom, soft) can help reduce the GC pause time further as these warrant stop the world phase

- Shenandoah likes ReentrantLocks better than synchronised classes the latter bloats up its monitors and increases the size of the rootset

- The reflection problem may initial Shenandoah to run GC cycles even when there is no memory pressure and thus may reduce performance over time. It is better to limit it using reflection inflation flag

- On Amazon ECS, provide the CPU share count explicitly; Shenandoah likes more threads

[+] rkennke|4 years ago|reply

Notice that with Shenandoah GC, weak references are handled concurrently since JDK 16, native roots like locks, classes, etc are handled concurrently since JDK 14. Therfore the first 3 problems that you mentioned are solved. Follow the links the OP article to find explanations for each of these improvements.

BTW, you can specify the number of GC threads that you like to use -XX:ConcGCThreads=X -XX:ParallelGCThreads=Y (X for concurrent GC, Y for STW pauses - which should no longer be very relevant).

Cheers!

[+] tpxl|4 years ago|reply

> One neat feature of Shenandoah is once the application shuts down, it prints into the GC log a table depicting how much time each phase of the GC cycle took. This data can be crucial to optimize the GC pause time further

Other GCs will do the same for you (G1, ZGC) which you can then put into a tool like gceasy to pretty graphs.

[+] raducu|4 years ago|reply

> - Removal of weak references (finalizer, phantom, soft) can help reduce the GC pause time further as these warrant stop the world phase

I'm in way disputing this, but why would a weak reference be different than any other reference, beyond the added flexibility it gives the GC (you may or may not GC this reference if you please)?

I admit I don't give GC much thought beyond every coupe of years when I switch jobs :)

[+] mwcampbell|4 years ago|reply

> - On Amazon ECS, provide the CPU share count explicitly; Shenandoah likes more threads

Can you explain this further? I'm missing some connection here between ECS configuration and how the JVM sees the container.

[+] dig1|4 years ago|reply

Awesome work! In our shop, we managed to solve significant application bottlenecks simply by switching to Shenandoah GC. The application in question is very complex, so any kind of rewrite in another language is out of the question.

This is the power of JVM we do not get from other languages/VMs. Kudos to JVM (previous and current) architects, developers, and everyone involved with it.

[+] bjarneh|4 years ago|reply

> The application in question is very complex, so any kind of rewrite in another language is out of the question.

I think I know what you mean. If a couple of guys quit, the company is in major trouble? :-)

Forgot this: /s

[+] dis-sys|4 years ago|reply

> This is the power of JVM we do not get from other languages/VMs.

sub-millisecond gc pause has been available to Golang users for ages.

[+] grogers|4 years ago|reply

Shenandoah is keeping pace nicely with ZGC, especially important to make this into the jdk 17 release since that will be what most users target for the next few years.

Given the similar feature sets, does anyone with experience running both ZGC and shenandoah have any differentiators?

[+] rkennke|4 years ago|reply

The main difference from user perspective are 1. ZGC doesn't support compressed references (-XX:+UseCompressedOops) which could affect workloads <32GB heap and 2. Shenandoah GC is not available in Oracle builds

[+] uvesten|4 years ago|reply

I’m slightly confused as to what the relation is between OpenJDK and Oracle JDK. Are they based on the same codebase? I was under the (apparently mistaken) impression that OpenJDK and the ”official” (Oracle) JDK were totally different implementations, but seeing how Oracle’s JDK just hit 17, and apparently OpenJDK did at the same time I guess they are derived from the same source. Could anyone enlighten me?

[+] Reason077|4 years ago|reply

They are built from the same code base. Any differences are just down to the license, packaging, etc.

[+] juddgaddie|4 years ago|reply

It would have been valuable to see the percentile distribution of latency, 99th, 99.999... max etc

[+] yutijke|4 years ago|reply

I am a layman with regards to JVM internals.

Are there any differences in the targeted workloads or capabilities between ZGC and Shenandoah GC?

Both seem to be marketed for large heaps and low pause times.

[+] rkennke|4 years ago|reply

The main difference from user perspective are 1. ZGC doesn't support compressed references (-XX:+UseCompressedOops) which could affect workloads <32GB heap and 2. Shenandoah GC is not available in Oracle builds

[+] ludamad|4 years ago|reply

I saw this start during my internship at Red Hat in 2012 :) cool to see it kicking around

[+] stonemetal12|4 years ago|reply

So given that the JDK is open source and dotnet is open source, would it be possible to port Shenandoah?

It is my understanding that D's garbage collector isn't that good, could they pick up an OpenJDK collector?

[+] kaba0|4 years ago|reply

I am unfortunately not an OpenJDK dev (but would really like to become one down the road), but afaik a GC is way too dependent on the internals of the runtime. The JVM has a very different object structure.

Also, I think D allows for pointers which is a no-go with compacting GCs (as those move objects around). So all in all, the algorithm used itself can be ported, but it has many runtime specific logic, which is just as important for great performance.

[+] bob1029|4 years ago|reply

The quality of the JVM GC is impressive to me as a .NET developer. I would love to have reliable sub-ms pauses in my apps. I am working on one right now where even small pauses >1ms are potentially detectable by the user.

[+] kllrnohj|4 years ago|reply

As a Java developer I'd love to have .NET's value types and runtime generics. Something something grass is always greener on the side :)

[+] unknown|4 years ago|reply

[deleted]

[+] anderson1993|4 years ago|reply

Hi. Can the Shenandoah's concurrent stack scanning in RedHat builds of the JDK 17 be considered production ready?

[+] JanecekPetr|4 years ago|reply

If it's not behind XX:+UnlockExperimentalVMOptions (or a preview/incubator module), it's considered stable and prod-ready, yes. Will there be fixes and improvements? Yes. Should you be afraid to use it for business? No, but obviously test it first.

[+] the-alchemist|4 years ago|reply

You can buy support from Redhat, which would presumably include support for their JDK build with Shenandoah.

Not sure what that means in practice, but if someone is putting money on the line, it must be worth _something_.

294 comments