It took me some time working with a non-GC language to fully appreciate how amazing Garbage Collection is. It really makes a lot of things easier, simpler and specially safer. Yes, of course it has some drawbacks. Yes, of course, not every single kind of program can afford to have a GC. But hell, life is so much easier with a GC! I'm very happy to see that there is active research and development happening to make the drawbacks of GC as minimal as possible!
I feel like the key is knowing that your problem domain is fine with there always being garbage collection in the loop. Otherwise there's the risk that down the road you'll have to start tracing the code, rewriting things to reuse objects, etc etc.
Also, because GC is “lazy,” your program may even end up running faster than one that uses an “eager” memory management, such as the one used in C++, where destructors are invoked, and objects on the heap are deallocated, early on, i.e. as soon as something goes out of scope, rather than when the process is close to running out of its memory quota (which may never happen).
I was testing ZGC the other day on my machine (24vcores, 64GB RAM) and it ate through about 4GB/s of garbage with about 10milisec of pauses over 220sec of runtime (24 threads doing nothing but allocations). G1 ate through about 2x as much garbage but with higher pause times. It's amazing how far we've come.
I hope this means we can get more games in Java since I'd like to code one or two in my favorite language :)
Java's been making the garbage collector better for decades at this point and is still in the second performance tier compared to those other languages.
Of course, if one does GC tuning like a pro, uses a sufficiently smart GC and maybe a custom VM (and unicorns) one may reach the mythical native speeds (typically at the cost of using a lot more memory).
Thanks, I still prefer exactly-zero ms pauses. Maybe that's just me, but I don't think so. And, I don't think I am giving anything up by skipping them.
The gRPC libraries are primarily developed by Google. Google pours most of its development resources on its core languages: Java, Javascript, Go, C++, and Python. Google then allocates a developer to port the implementation to other languages and platforms. C++, unlike Java, is not typically used for cloud development, so I don't think a lot of resources were allocated for the language.
You can see how the difference in implementation effort can impact performance by comparing dotnet_grpc (Microsoft's fork) against csharp_grpc (Google's original implementation). There's more than a sevenfold improvement in req/s in Microsoft's implementation in the 1 CPU server case (35070 vs 5337), outperforming nearly all the Java benchmarks.
Also, many of those top performing JVM implementations are the same code running under a different garbage collector. .NET has two GCs each with a parallel option, but we only see one benchmark using likely the slower GC (Workstation GC instead of ServerGC).
We wrote some experimental network drivers in high-level languages a few years ago and Java performed better than expected. Yeah, that's a very special and somewhat odd use case, but it was fun :)
We've got some graphs [0] comparing throughput and long-tail latency for the various Java GCs available in OpenJDK 12. Shenandoah's worst case pause time was ~45µs which is the same as disabling the GC (Epsilon GC) which is pretty impressive. Overall performance did suffer under Shenandoah a little bit back then, though. However, I've heard that this improved recently.
Java can be amazingly performant. One particular example is a video compressor handling many concurrent streams, recompressing them on the fly and shipping them out again saturating a gigabit link.
One neat feature of Shenandoah is once the application shuts down, it prints into the GC log a table depicting how much time each phase of the GC cycle took. This data can be crucial to optimize the GC pause time further:
- Removal of weak references (finalizer, phantom, soft) can help reduce the GC pause time further as these warrant stop the world phase
- Shenandoah likes ReentrantLocks better than synchronised classes the latter bloats up its monitors and increases the size of the rootset
- The reflection problem may initial Shenandoah to run GC cycles even when there is no memory pressure and thus may reduce performance over time. It is better to limit it using reflection inflation flag
- On Amazon ECS, provide the CPU share count explicitly; Shenandoah likes more threads
Notice that with Shenandoah GC, weak references are handled concurrently since JDK 16, native roots like locks, classes, etc are handled concurrently since JDK 14. Therfore the first 3 problems that you mentioned are solved. Follow the links the OP article to find explanations for each of these improvements.
BTW, you can specify the number of GC threads that you like to use -XX:ConcGCThreads=X -XX:ParallelGCThreads=Y (X for concurrent GC, Y for STW pauses - which should no longer be very relevant).
> One neat feature of Shenandoah is once the application shuts down, it prints into the GC log a table depicting how much time each phase of the GC cycle took. This data can be crucial to optimize the GC pause time further
Other GCs will do the same for you (G1, ZGC) which you can then put into a tool like gceasy to pretty graphs.
> - Removal of weak references (finalizer, phantom, soft) can help reduce the GC pause time further as these warrant stop the world phase
I'm in way disputing this, but why would a weak reference be different than any other reference, beyond the added flexibility it gives the GC (you may or may not GC this reference if you please)?
I admit I don't give GC much thought beyond every coupe of years when I switch jobs :)
Awesome work! In our shop, we managed to solve significant application bottlenecks simply by switching to Shenandoah GC. The application in question is very complex, so any kind of rewrite in another language is out of the question.
This is the power of JVM we do not get from other languages/VMs. Kudos to JVM (previous and current) architects, developers, and everyone involved with it.
Shenandoah is keeping pace nicely with ZGC, especially important to make this into the jdk 17 release since that will be what most users target for the next few years.
Given the similar feature sets, does anyone with experience running both ZGC and shenandoah have any differentiators?
The main difference from user perspective are 1. ZGC doesn't support compressed references (-XX:+UseCompressedOops) which could affect workloads <32GB heap and 2. Shenandoah GC is not available in Oracle builds
I’m slightly confused as to what the relation is between OpenJDK and Oracle JDK. Are they based on the same codebase?
I was under the (apparently mistaken) impression that OpenJDK and the ”official” (Oracle) JDK were totally different implementations, but seeing how Oracle’s JDK just hit 17, and apparently OpenJDK did at the same time I guess they are derived from the same source.
Could anyone enlighten me?
The main difference from user perspective are 1. ZGC doesn't support compressed references (-XX:+UseCompressedOops) which could affect workloads <32GB heap and 2. Shenandoah GC is not available in Oracle builds
I am unfortunately not an OpenJDK dev (but would really like to become one down the road), but afaik a GC is way too dependent on the internals of the runtime. The JVM has a very different object structure.
Also, I think D allows for pointers which is a no-go with compacting GCs (as those move objects around). So all in all, the algorithm used itself can be ported, but it has many runtime specific logic, which is just as important for great performance.
The quality of the JVM GC is impressive to me as a .NET developer. I would love to have reliable sub-ms pauses in my apps. I am working on one right now where even small pauses >1ms are potentially detectable by the user.
If it's not behind XX:+UnlockExperimentalVMOptions (or a preview/incubator module), it's considered stable and prod-ready, yes. Will there be fixes and improvements? Yes. Should you be afraid to use it for business? No, but obviously test it first.
[+] [-] ceronman|4 years ago|reply
[+] [-] mikepurvis|4 years ago|reply
[+] [-] Koshkin|4 years ago|reply
[+] [-] mnd999|4 years ago|reply
Java: pauses are bad, let’s make the garbage collector better.
Much credit to the Java community for ignoring the noise and building something great that lets most applications have the best of both worlds.
[+] [-] tpxl|4 years ago|reply
I hope this means we can get more games in Java since I'd like to code one or two in my favorite language :)
[+] [-] lmilcin|4 years ago|reply
I worked on one which was required to respond to a packet from the network within 5 microseconds of the packet arriving.
Please, don't assume all applications are mobile uis, webapps and corporate backends.
[+] [-] pjmlp|4 years ago|reply
Java(JikesRVM)/Oberon/Go/D/Nim: Let rewrite the whole toolchain in X, including the memory manager itself.
[+] [-] Semaphor|4 years ago|reply
[+] [-] voidnullnil|4 years ago|reply
C diehards: Pauses are bad, we need manual memory management.
Everyone else: Does not have manual memory management.
[+] [-] sgt|4 years ago|reply
[+] [-] throwaway894345|4 years ago|reply
[+] [-] blub|4 years ago|reply
Of course, if one does GC tuning like a pro, uses a sufficiently smart GC and maybe a custom VM (and unicorns) one may reach the mythical native speeds (typically at the cost of using a lot more memory).
[+] [-] ncmncm|4 years ago|reply
[+] [-] thinkharderdev|4 years ago|reply
I was looking at gRPC benchmarks the other day (https://github.com/LesnyRumcajs/grpc_bench/wiki/2021-08-30-b...) and 6 of the top 7 performing gRPC implementations on 3 core CPUS were on the JVM.
[+] [-] wesnerm2|4 years ago|reply
The gRPC libraries are primarily developed by Google. Google pours most of its development resources on its core languages: Java, Javascript, Go, C++, and Python. Google then allocates a developer to port the implementation to other languages and platforms. C++, unlike Java, is not typically used for cloud development, so I don't think a lot of resources were allocated for the language.
You can see how the difference in implementation effort can impact performance by comparing dotnet_grpc (Microsoft's fork) against csharp_grpc (Google's original implementation). There's more than a sevenfold improvement in req/s in Microsoft's implementation in the 1 CPU server case (35070 vs 5337), outperforming nearly all the Java benchmarks.
Also, many of those top performing JVM implementations are the same code running under a different garbage collector. .NET has two GCs each with a parallel option, but we only see one benchmark using likely the slower GC (Workstation GC instead of ServerGC).
[+] [-] srazzaque|4 years ago|reply
Say what you will about Java (the language), the JVM is a seriously good piece of kit.
[+] [-] pjmlp|4 years ago|reply
[+] [-] emmericp|4 years ago|reply
We've got some graphs [0] comparing throughput and long-tail latency for the various Java GCs available in OpenJDK 12. Shenandoah's worst case pause time was ~45µs which is the same as disabling the GC (Epsilon GC) which is pretty impressive. Overall performance did suffer under Shenandoah a little bit back then, though. However, I've heard that this improved recently.
[0] https://github.com/ixy-languages/ixy-languages/blob/master/J...
[+] [-] jacquesm|4 years ago|reply
[+] [-] enduku|4 years ago|reply
- Removal of weak references (finalizer, phantom, soft) can help reduce the GC pause time further as these warrant stop the world phase
- Shenandoah likes ReentrantLocks better than synchronised classes the latter bloats up its monitors and increases the size of the rootset
- The reflection problem may initial Shenandoah to run GC cycles even when there is no memory pressure and thus may reduce performance over time. It is better to limit it using reflection inflation flag
- On Amazon ECS, provide the CPU share count explicitly; Shenandoah likes more threads
[+] [-] rkennke|4 years ago|reply
BTW, you can specify the number of GC threads that you like to use -XX:ConcGCThreads=X -XX:ParallelGCThreads=Y (X for concurrent GC, Y for STW pauses - which should no longer be very relevant).
Cheers!
[+] [-] tpxl|4 years ago|reply
Other GCs will do the same for you (G1, ZGC) which you can then put into a tool like gceasy to pretty graphs.
[+] [-] raducu|4 years ago|reply
I'm in way disputing this, but why would a weak reference be different than any other reference, beyond the added flexibility it gives the GC (you may or may not GC this reference if you please)?
I admit I don't give GC much thought beyond every coupe of years when I switch jobs :)
[+] [-] mwcampbell|4 years ago|reply
Can you explain this further? I'm missing some connection here between ECS configuration and how the JVM sees the container.
[+] [-] dig1|4 years ago|reply
This is the power of JVM we do not get from other languages/VMs. Kudos to JVM (previous and current) architects, developers, and everyone involved with it.
[+] [-] bjarneh|4 years ago|reply
I think I know what you mean. If a couple of guys quit, the company is in major trouble? :-)
Forgot this: /s
[+] [-] dis-sys|4 years ago|reply
sub-millisecond gc pause has been available to Golang users for ages.
[+] [-] grogers|4 years ago|reply
Given the similar feature sets, does anyone with experience running both ZGC and shenandoah have any differentiators?
[+] [-] rkennke|4 years ago|reply
[+] [-] uvesten|4 years ago|reply
[+] [-] Reason077|4 years ago|reply
[+] [-] juddgaddie|4 years ago|reply
[+] [-] yutijke|4 years ago|reply
Are there any differences in the targeted workloads or capabilities between ZGC and Shenandoah GC?
Both seem to be marketed for large heaps and low pause times.
[+] [-] rkennke|4 years ago|reply
[+] [-] ludamad|4 years ago|reply
[+] [-] stonemetal12|4 years ago|reply
It is my understanding that D's garbage collector isn't that good, could they pick up an OpenJDK collector?
[+] [-] kaba0|4 years ago|reply
Also, I think D allows for pointers which is a no-go with compacting GCs (as those move objects around). So all in all, the algorithm used itself can be ported, but it has many runtime specific logic, which is just as important for great performance.
[+] [-] bob1029|4 years ago|reply
[+] [-] kllrnohj|4 years ago|reply
[+] [-] unknown|4 years ago|reply
[deleted]
[+] [-] anderson1993|4 years ago|reply
[+] [-] JanecekPetr|4 years ago|reply
[+] [-] the-alchemist|4 years ago|reply
Not sure what that means in practice, but if someone is putting money on the line, it must be worth _something_.