How the Boehm Garbage Collector Works

[+] __alexs|14 years ago|reply

Does anyone have data on how much it leaks in practice? Also, could maliciously crafted input result in a DoS vector for Boehm using applications?

[+] neild|14 years ago|reply

I once worked on a project that used the Boehms collector for a long-running server process.

About once a year, we'd lose a week or two to tracking down a memory issue. While the root cause was never a GC bug, the GC would make it dramatically more difficult to figure out what the real problem was.

The closest we had to an actual GC bug was a leak involving a data structure containing a list of IP addresses that was incorrectly interpreted as containing pointers. It took forever to figure out what had happened, since the issue only appeared if you worked with a specific set of IP addresses--and even after we had a replication case, the GC made it extremely difficult to determine why memory was growing rapidly. The fix was simple once we identified the precise problem--flag that memory as "atomic" (i.e., guaranteed not to contain pointers)--but finding the precise data structure that was holding memory live was a nightmare.

We eventually ran into a slow memory growth issue that we couldn't figure out. After much debugging, one developer became fed up with the situation and tore the GC out. It took him about two weeks to produce a version of the codebase that leaked less than the GC-enabled one, and another couple weeks to eliminate virtually all memory leaks. We continued to find minor leaks for another several months, none of which were difficult to correct. (We used a debugging malloc library that flagged leaked memory on exit, so we always knew exactly where the leak had originated.)

Not only did the GC-less version leak less, it used about 40% less memory.

I would strongly recommend against using the Boehm collector for any long-running processes. The tradeoffs may be more acceptable in short-term processes where slow leakage over time is unimportant.

[+] ghusbands|14 years ago|reply

My story is similar to nield's; I worked on a project that used the Boehm-Weiser collector in a 32-bit process.

Once or twice a tear, we'd have an unbounded memory growth to trace. Most of the time, we were able to solve it by providing layouts for structures that mixed pointers and pointer-like data. The debug facility it provides for providing a random backtrace to a root was helpful in those cases.

However, more than once we had a leak that it was beyond our ability to trace, despite a significant amount of diagnostic work. We found some workarounds that were specific to our app, which helped.

In the end, the costs to us were high enough that we rewrote our codebase in C# (our software was Windows-only). It was pretty much a line-for-line conversion, but it worked and we no longer had any memory leaks (except for one, in .NET timers, that took only minutes to trace using readily-available diagnostic tools).

I do believe a 64-bit process using the Boehm collector would be much less likely to suffer these issues, but this was before x64 was prevalent in corporate server rooms.

On your second question: Crafted input can indeed result in significant memory retention, if you aren't careful to exclude user input from the GC and have large, cyclic structures to collect, on a 32-bit machine, with almost any text encoding.

[+] stephenjudkins|14 years ago|reply

This will vary between 32-bit and 64-bit environments. My understanding is that the vastly larger address in 64-bit environments greatly mitigates the chance that data will be mistaken as pointers.

As far as crafting input to match allocated addresses, that is an interesting idea. One could, at the very least, create data that lies within the heap's address range. The issue is, you'd need a lot of data to blacklist a significant amount of memory, and if that's the case why not just DoS with a ton of data?

In practice, a fragmented heap is probably a much bigger issue than actual leaks. As the article points out, a conservative GC cannot perform compaction.

[+] gchpaco|14 years ago|reply

Storing binary data in memory and not explaining what it is to the Boehm GC with one of the macros they supply is the easiest way to fall down. Since "fairly arbitrary" binary data can be encoded as UTF-8 strings (many 32-bit binary codes cannot be a part of a UTF-8 string, but most can) you need to mark strings as non-pointer data too, which people are not always good about doing. If you do things correctly (that is, mark your blobs of arbitrary data properly) then it's reasonably good but of course most people don't and it runs counter to the idea that Boehm is a drop-in replacement.

This isn't Boehm's fault, specifically; any conservative GC would have similar trouble. You can't get completely away from cooperating with the GC and have good GC both.

[+] abecedarius|14 years ago|reply

I used it in a Scheme interpreter with no noticeable problem as my main personal hacking platform for years. OTOH, I didn't use it for long-lived servers, and Racket eventually migrated from it to a precise collector and IIRC the release notes mentioned occasional such problems as a factor. If they said how often or how severe, I missed it.

[+] sambeau|14 years ago|reply

I have found that it more likely holds on to some pointer longer than it could rather than leaking memory. For instance, I've never known it to start eating up memory.

[+] loeg|14 years ago|reply

One of the biggest problems with this style of "scan all memory" GC is that you have to stop all threads (flush all pointers to memory) to perform a collection. Pointers held in registers in other active threads are inaccessible to the GC thread.

Failing to stop the world (or ensure all pointers are in memory rather than registers) can result in live objects being collected. This happened to Ruby at one point: http://timetobleed.com/the-broken-promises-of-mrireeyarv/

Edit: This boils down to breaking the compiler's model of memory as specified by the C standard; all bets are off.

[+] cbsmith|14 years ago|reply

Um no. First, it doesn't "scan all memory". It scans the root and fans out. You can do read/write barriers and such to avoid blocking GC. It doesn't matter what value registers have in them. What matters is what is on the stack and what is reachable from said stack. If you have control of malloc, you can be far more clever than what you are describing. Ruby had a crummy implementation of GC, but you'll notice, for example, that most JVM's do "scan all memory" without forcing all active threads to be blocked at once.

[+] amatus|14 years ago|reply

This would fail badly on a http://en.wikipedia.org/wiki/XOR_linked_list

[+] kmm|14 years ago|reply

You really shouldn't be using a XOR linked list anyway. It has a lot of disadvantages and none of them weigh up to the only advantage of a lower memory footprint per list item.

[+] silon4|14 years ago|reply

If RAM is precious, you shouldn't use GC anyway. It typically has 2x memory use overhead to reduce GC performance cost.

[+] kmm|14 years ago|reply

I've often wondered how a garbage collector can scan the entire memory in a reasonable time. The article doesn't explain that particular operation. With a quick look at the source code I couldn't figure it out either.

Can anyone enlighten me? Is it only the stack that is checked because that seems possible in a reasonable time.

[+] loeg|14 years ago|reply

Boehm controls malloc; it knows where the heap is. In general, yes, the more memory you have allocated, the slower BoehmGC will be.

It checks both stack and heap.

[+] cpeterso|14 years ago|reply

The GC doesn't have to scan the entire address space. It starts scanning with GC roots, memory regions known to contain pointers such as stack, heap, and global variables (registered with the GC).

[+] vilda|14 years ago|reply

Neither in the article nor it the paper they mention cycles in references. Does it solve it in any way?

[+] chrisb|14 years ago|reply

Garbage collectors have no problem with reference cycles.

A GC works by finding all objects that are reachable (referenced by some other object or a root), then discarding all other objects. So a cyclic data structure is completely discarded as soon as there are no live references to any of its objects.

As usual, Wikipedia has lots more information: http://en.wikipedia.org/wiki/Garbage_collection_(computer_sc...

GC roots are generally things like: static/global variables, active stack frames (function parameters and local variable), CPU registers.

[+] rayiner|14 years ago|reply

BDW GC is a tracing collector. It doesn't need to have any special handle of cycles--either a cycle is reachable from the root set, in which case it is marked and considered alive, or it is not, and is never even seen by the collector before being reclaimed.

[+] neuroelectronic|14 years ago|reply

[deleted]

37 comments