top | item 31156063

(no title)

neel_k | 3 years ago

The short answer is: reference counting walks the dead part of the heap, and tracing gc walks the live part of the heap.

When a reference count of an object goes to zero, you recursively decrement the reference counts of everything the object points. This leads you to trace the object graph of things whose reference count has gone to zero. You never follow pointers from anything which is live (i.e., has a refcount > 0).

When you are doing the copy phase of a gc, you start with the root set of live objects, and follow pointers from everything that is live. Since anything pointed to by a live object is live, you only follow the pointers of live objects. You never follow pointers from anything which is dead (i.e., garbage).

If object lifetimes are short, most objects will be dead, and so RC will be worse than GC. If object lifetimes are long, most objects will be live, and GC will be worse than RC.

Empirically, the overwhelming majority of objects have a very short lifetime, with only a few objects living a long time. (This is called "the generational hypothesis">) So the optimal memory allocator will GC short-lived objects and RC long-lived objects. Rust/C++ encourages you to do this manually, by stack-allocating things you think will be short-lived, and saving RC for things with an expected long lifetime.

Beyond this, RC has a few really heavy costs.

Reference counting doesn't handle cyclic memory graphs. You need to add tracing to handle those, and if you are going to do tracing anyway, it's tempting to just do tracing really well and skip the refcounts entirely.

This is because the memory overhead of reference counts is high -- empirically, most objects never have more than a single reference to them, and so using a whole word for reference counts is lot of overhead. Moreover, the need to increment/decrement reference counts is really bad for performance: first, mutations are expensive in terms of memory bandwidth (you've got to maintain cache coherence with the other CPUs), and second, in a multicore setting, you have to lock that word to ensure the updates are atomic.

There are tricks to mitigate this (e.g., Rust distinguishes Arc and Rc for objects which can be shared between threads or not), and there are schemes to optimise away RC assignments with static analysis (deferred reference counting), but if you want to do a really good job of reference counting, then you will be implementing a lot of tracing GC machinery.

And vice versa! The algorithm in the link is partly about adding RC to handle old objects (empirically, as part of the generational hypothesis, objects which have lived a long time will live a long time more). In fact, Blackburn and McKinley (two of the three authors of the above paper), pioneered the combination approach with their paper "Ulterior Reference Counting."

discuss

KMag|3 years ago

Minor nit: the description above is true for copying GCs, but non-copying mark-sweep collectors generally still touch the header words of both live and dead objects in the heap in order to add dead objects to free lists. Mark-sweep-compact collectors also end up reading all of the mark words in the object headers of both live and dead objects in order to find the holes to be filled by compaction.

It's also not uncommon to have a copying young generation and a mark-sweep-compact tenured generation. That way, you get the advantages of not needing to scan the huge numbers of young dead objects, but the space savings of not needing 2x space for the older generation.