top | item 31192261

Distilling the Real Cost of Production Garbage Collectors [pdf]

119 points| mjb | 3 years ago |users.cecs.anu.edu.au | reply

156 comments

[+] Sindisil|3 years ago|reply

Twitter post from Steve Blackburn (one of the authors) announcing the paper and providing a bit of context: https://twitter.com/stevemblackburn/status/15196411576216576...

Haven't quite finished reading the paper yet, but it's interesting so far.

While they present a method that could be applied more widely, this paper is only analyzing current tracing GCs in OpenJDK (plus Epsilon GC). Would love to see future papers taking other GCs into account (e.g., various flavors of refcount GC).

Even better would be a way to compare vs methods of "manual" memory management (manual alloc/free, RAII/scope based alloc/free, arena allocation, etc.). Unfortunately, non-GC memory management is very application specific, so I'm not sure if a generalized analysis would be super useful.

[+] gwbas1c|3 years ago|reply

> Even better would be a way to compare vs methods of "manual" memory management (manual alloc/free, RAII/scope based alloc/free, arena allocation, etc.). Unfortunately, non-GC memory management is very application specific, so I'm not sure if a generalized analysis would be super useful.

I've personally found Rust to be a very difficult language to learn and become productive in.

In contrast, I was able to jump into Swift and become productive very quickly.

Swift isn't GC, but is reference counted. Reference cycles will lead to leaked memory.

So what's interesting is to know GC's overhead relative to what: Reference counted languages like Swift? Or languages like Rust?

More importantly: Is the overhead worth it? Developer time is valuable; thus leading to the critical question: What's cheaper, extra developer time or a more powerful computer?

[+] xwolfi|3 years ago|reply

Well it depends a bit, in my low latency application where we cant afford GC pauses, we tolerate them from 5am to 5:30 am. We do everything preallocated and if we must tolerate new heap allocation we simply never release (so we have a lot of object pools).

I felt it was hard and exotic at first but now I feel it's quite logical: never free, and have a proper monitoring of your size per unit of information to size the heap right and you're done.

But then it also depends why you dont want GC, us it's because clients cannot tolerate a pause (we had a socket manager creating and releasing garbage that triggered one 50ms GC pause during the last 3 seconds of trading of a particularly crazy trading day and the client threatened to leave), but often they re tolerable if you just stay reasonable with your garbage.

The problem is not the collection in non critical applications, it's the garbage generation getting completely out of hand. In a C# application my team had to micro optimize because of covid-related explosive trading volume, we simply bought gigantic amount of RAM for our 200 users, only allowed GC if CPU was in low usage and run with 80GB memory usage at the end of the day for an order grid that could run with 5GB collected. That s another way: I wish Java 8 had an easier way to simply deactivate GC for us to run with hardware money.

[+] dragontamer|3 years ago|reply

The cost of memory management is:

1. The cost of free(), which recursively includes the cost of free()ing all "substructures" (ex: linked list free(head) includes the cost of free()ing the entire linked list)

2. The cost of malloc()

Manual memory management has both malloc() and free(). While garbage-collection memory management removes free() entirely, and does everything with malloc() alone (with "malloc" having some kind of mechanism to automatically detect which bits of RAM are safe to re-allocate without free() to guide it).

These costs can be incredibly complicated to figure out (!!!). free()ing a 4MB linked-list for example, would flush out your L1 cache and L2 cache, as well as take up a significant chunk of your L3 cache.

There is also the issue of fragmentation: malloc() and free() are cheap when the program starts, but as the heap gets more and more confounded by malloc()/free() calls, future calls to malloc() and free() can get slower-and-slower.

So there's a degree of:

* "Housekeeping" -- anti-fragmentation mostly, sorting the free()'d pointers and regions into groups more easily malloc'd() in the future.

* Synchronization -- Having all your threads pull from the same pool of memory is easier to think about, but requires multi-threaded synchronization. For long-lasting free() (ex: freeing a 4MB linked list), this could have one thread slowing down other threads when they try to malloc or free themselves.

-----------

Finally, the application code grossly determines what is faster or slower. Many people assume manual memory management is faster, but consider the following example:

Lets say our application-data consists of 1MB of misc. data + 3999MBs of linked-list data (including the internal pointers). Lets also assume 4000MBs is when the garbage collector runs.

Manual memory management would malloc() the linked list until its of 3999MB size, then the free() call would traverse the entire 3999MB structure every time to find the "deletion" pointers.

In contrast, Garbage collection of malloc() the linked list to 3999MB. The free() doesn't exist, but the linked-list will be detected as "dead data" upon the next malloc() call (only the 1MB of misc. data is traversed).

As such: the garbage-collector will never "Traverse" the entire linked list. The garbage collector only traverses the "live data" whenever it runs (which might be the 1MB of misc. data... but if this misc. data is long lived, then a generational garbage collector won't even traverse that!!).

So really, garbage collectors are best when you have small amounts of live data, and large amounts of malloc() calls. In contrast, manual memory management is best when you have lots of live-data (especially "quickly moving" live data, such that the generational-heuristic doesn't help), and relatively little dead-data.

[+] titzer|3 years ago|reply

I've seen internal numbers from $MEGACORP that suggest large-scale C++ applications spend 15-20% of their time doing memory management. Incidentally, the memory fragmentation on those same applications can end up as high as 40%. In GC'd languages fragmentation is typically closer to 10%.

[+] jeffbee|3 years ago|reply

The paper specifically discuss the cases of high mutation rates and small heaps, because it is a pathological case for the trendy new low-pause GCs in the JVM.

[+] kevin_thibedeau|3 years ago|reply

If deallocating large datasets is a performance concern then you should adopt a different allocation strategy like arenas.

[+] dehrmann|3 years ago|reply

Interesting. The cost of free() could be an advantage for Java in some situations because it can GC without pauses in a separate thread.

[+] adamrezich|3 years ago|reply

in the "jai" programming language there's this thing called Temporary_Storage, which is basically an arena allocator, which lives in the "context", which is something that you can "push" to to make it different per-thread if needed, or anything else (like per-request in a web server). you can allocate anything you want in Temporary_Storage, and then when you call reset_temporary_storage() it does what is says and resets it. for a video game you will call this every frame, but for different contexts like web development you could reset this after resolving a request or whatever. you could use this exact same system in C++ even though there's no built-in language feature for it, but it probably wouldn't play nice with RAII or whatever (unless your data structures were aware of the existence of your hand-rolled Temporary_Storage system and knew not to call free() in the destructor).

after learning about the costs associated with using malloc()/free(), then being exposed to long-term arena allocators, then being exposed to a "temporary storage" system like this, when people talk about garbage collection being absolutely necessary for developer productivity or whatever plus also a way of avoiding malloc()/free() costs, I can't help but wonder if anyone who promotes garbage collection as a borderline necessity has ever tried something like Temporary_Storage. I find it to be pretty wild that instead of figuring out better memory management models like this, we collectively decided to punt on the whole issue, declare it an unsolvable problem, and decide that increasingly complex garbage collection systems are the only way forward. with Temporary_Storage in place, you can do whatever small allocations you need at any point in time (parse an integer from a string, etc.), and you don't have to worry about free()ing the result, because it's taken care of for you, without Actually Being Garbage Collection.

this entirely solves both "1." and "2." above, without the uncertainty of not knowing when exactly the garbage created by the thing you're free()ing and all of its substructures will be cleaned up. instead, you know exactly when it's going to be cleaned up, once per frame at the end of the frame (or whenever you call reset_temporary_storage()).

everything below the line break above sounds crazy to me. why are your linked list nodes all being malloc()d randomly on the heap, instead of in an arena allocator that you can also just flush all at once that's what you want to do? why would you ever traverse a complex nested structure to free it all, when you could just free() the entire hunk of memory at once (or, if you're going to be reusing that chunk of memory, just reset the arena allocator pointer to the beginning)?

[+] avgcorrection|3 years ago|reply

Linked lists? Not a useful data structure in application programming when using imperative languages. And if you are using a functional language then most people agree that a GC is necessary (FP without GC sounds like a research problem).

[+] ape4|3 years ago|reply

Of course, there are benefits of GC: less development time, fewer null pointer exceptions, double free's and other other mistakes, etc

[+] kragen|3 years ago|reply

I don't see how GC reduces null pointer exceptions. If anything, GC encourages the use of pointers (as opposed to nested child objects), which increases null pointer exceptions.