Performance of modern Java on data-heavy workloads

[+] willvarfar|5 years ago|reply

A very clear and interesting post.

I've been trying to fit big-enough long-running stuff into JVMs for a few years, and have found that minimizing the amount of garbage is paramount. Its a bit like games- or C programming.

Recent JVM features like 8-bit strings and not having a size-limit on the interned pools etc have been really helpful.

But, for my workloads, the big wastes are still things like java.time.Instant and the overhead of temporary strings (which, these days, copy the underlying data. My code worked better when split strings used to just be views).

There are collections for much more memory-efficient (and faster) maps and things, and also efficient (and fast) JSON parsing etc. I have evaluated and benchmarked and adopted a few of these kinds of things.

Now, when I examine heap-dumps and try and work out where more I can save bytes to keep GC at bay, I mostly see fragments of Instant and String, which are heavily used in my code.

If there was only a library that did date manipulation and arithmetic with longs instead of Instant :(

[+] nicktelford|5 years ago|reply

> If there was only a library that did date manipulation and arithmetic with longs instead of Instant :(

You can always pass around long timestamps and just convert to Instant whenever you need to do any date/time processing. Provided the Instant doesn't escape the method it's allocated in, it should be optimized via inlining and Scalar Replacement so that it doesn't generate garbage. Of course, you'd be adding in the overhead of dividing up your long in to seconds/nanos each time.

Note: if this doesn't work on OpenJDK, try GraalVM: it's Partial Escape Analysis should do a better job at finding ways of eliding heap allocations.

[+] Yeroc|5 years ago|reply

There's a saying that "only the good die young" which applies to Java GC. If your Instants and Strings are really short lived then the GC for those is nearly free. For your workload are these objects living on the heap for long enough to be promoted beyond the young generation?

[+] shellac|5 years ago|reply

On G1 you used to be able to use `-XX:+UseStringDeduplication`, which gives you back something like the string sharing used pre-whatever it was (8?).

[+] jcfrei|5 years ago|reply

> There are collections for much more memory-efficient (and faster) maps and things, and also efficient (and fast) JSON parsing etc. I have evaluated and benchmarked and adopted a few of these kinds of things.

That sounds very interesting. Can you provide links to the benchmarks for fast JSON parsing (libraries)? And the fast maps?

[+] rb808|5 years ago|reply

You're absolutely right. Its one reason I struggle with the modern fashion for immutable classes and FP, they are always making copies of everything, seems crazy.

[+] blinkingled|5 years ago|reply

I wonder how things would have stacked with OpenJ9 - AdoptOpenJDK project makes OpenJ9 builds available for Java 8/11/13/14 - so it should be trivial to include it in the benchmarks.

We have been experimenting with it in light of the Oracle licensing situation and it does provide interesting set of options - AOT, various GCs (metronome, gencon, balanced) along with many other differentiators to OpenJDK like JITServer which offloads JIT compilation to remote nodes.

https://www.eclipse.org/openj9/docs/gc/

It doesn't get as much coverage when it should - it's production hardened - IBM has used it and still uses it for all their products - and it's fully open source.

[+] pron|5 years ago|reply

> in light of the Oracle licensing situation

You mean the licensing situation where Oracle completed open-sourcing the entire JDK and made Java free of field-of-use restrictions for the first time in its history?

If you're talking about the JDK builds you download from Oracle, then there are two (each linking to the other): one paid, for support customers, and one 100% free and open-source: http://jdk.java.net/

[+] molodec|5 years ago|reply

Specific workload matter a lot. I had a good experience with Shenandoah collector on an application that generates very few intermediate objects, but once an object is created it stays in the heap for a while ( a custom made key/value store for a very specific use case). Shenandoah collector was the best in terms of throughput and memory utilization. Most collectors are generational, so surviving objects have to be moved from Eden to Survivor to Old. Shenandoah is not generational, and I suspect it has less work to do for objects that survive compare to other collectors. When most objects live long enough generational collectors hinder performance.

[+] haxen|5 years ago|reply

In the case of Hazelcast Jet and similar products, loads of young garbage are unavoidable because it comes from the data streaming through the pipeline. A generational GC should in principle get a great head start in this kind of workload, and our benchmarks have confirmed it.

[+] bestboy|5 years ago|reply

Yep, workload matters. Generational garbage collectors are fundamentally at odds with caching/pooling of objects. They are based on the assumption that objects die young. Typically that is not the case for internal caches, though. Caches usually consist of long-living/tenured objects.

[+] ww520|5 years ago|reply

G1 looks very good. Glad it becomes the default so one less thing to tune for a deployment.

[+] cangencer|5 years ago|reply

Follow-up post: https://jet-start.sh/blog/2020/06/23/jdk-gc-benchmarks-remat...

[+] unknown|5 years ago|reply

[deleted]

[+] xvilka|5 years ago|reply

Converting Java code to Kotlin, then compiling it with the Kotlin Native[1] is more promising from the performance point of view. Native code is always faster (assuming compiler is good enough).

[1] https://kotlinlang.org/docs/reference/native-overview.html

[+] haxen|5 years ago|reply

An ahead-of-time compiler doesn't have the advantage of the call profile of polymorphic call sites. The JIT compiler has much more inlining opportunities, and in some cases this results in better performance.

Also, there are cases where manual memory management, which usually boils down to reference counting, has great overheads where a GC-managed runtime has no overhead at all. They involve repeatedly building up and then discarding large data structures. GC algorithms simply don't see the dead objects, whereas refcount-based management must explicitly free the memory of each object.

98 comments