top | item 2474168

(no title)

jfr | 15 years ago

> A GC system with explicitly visible reference counts (and immediate freeing) with language support to make it easier to get the refcounts right [...]

To be a little pedantic on the subject, such a system (reference counting and immediate freeing) is a form of automatic memory management, but it is not GC in any way. Garbage collection implies that the system leaves garbage around, which needs to be collected in some way or another. The usual approach to refcounting releases resources as soon as they are no longer required (either by free()ing immediately or by sending it to a pool of unused resources), thus doesn't leave garbage around, and doesn't need a collector thread or mechanism to.

There are partial-GC implementations of refcounting, either because items are not free()d when they reach zero references, or to automatically detect reference loops which are not handled directly.

I agree with Torvalds on this matter. GC as it is promoted today is a giant step that gives programmers one benefit, solving one problem, while introducing a immeasurable pile of complexity to the system creating another pile of problems that are still not fixed today. And to fix some of these problems (like speed) you have to introduce more complexity.

This is my problem with GC. I like simplicity. Simplicity tends to perform well, and being simple also means it has little space for problems. Refcounting is simple and elegant, you just have to take care of reference loops, which also has another simple solution, that is weak references. I can teach a class of CS students everything they need to know to design a refcounting resource management system in one lesson.

GC is the opposite: it is big, complex, and a problem that the more you try to fix it, the more complex it becomes. The original idea is simple, but nobody uses the original idea because it performs so badly. To teach the same class how to design a GC system that performs as well as we expect today, an entire semester may not be enough.

discuss

stcredzero|15 years ago

I agree with Torvalds on this matter.

In a way, I do as well.

GC as it is promoted today is a giant step that gives programmers one benefit, solving one problem, while introducing a immeasurable pile of complexity to the system creating another pile of problems that are still not fixed today. And to fix some of these problems (like speed) you have to introduce more complexity.

There are plenty of contexts where speed is a non-issue. In those cases, GC has been a huge win. The conceptual simplicity is the important part. The cost of the resources that would be saved with explicit and optimized memory management would be far outweighed by the resources required to implement such things.

The original idea is simple, but nobody uses the original idea because it performs so badly.

This is simply not true.

In the context of IO-bound enterprise systems, I've seen generational GC perform admirably, almost magically. As a lark, I've put infinite loops into such apps that do nothing but allocate new objects, and unless you are doing an exceptionally intense operation, you couldn't tell the difference. Properly tuned generational GC can be a truly fantastic seeming thing!

However, I will agree that the concerns Linus highlights are real, and that refcounting systems, like the one in iOS are by far better choices in many contexts.

EDIT: The above system I victimized, I only victimized in the TEST environment, but it was populated with something like 2-week old production data. The application in question is a traditional client/server desktop app used by a major energy company and had 800 active users at the time, handling millions in transactions every minute.

IDEA: If someone had an augmented ref-counting system with a runtime containing an optional cycle-detector and something like LINT but for the runtime reference graph, one would get most of the benefits of GC with the efficiency of the ref-counting system. I half expect someone to tell me that this already exists for Python.

abstractfactory|15 years ago

In 2004 Bacon et al. showed that tracing GC and ref-counting GC are special cases of a more general framework, and that it is possible to combine them to get the benefits of both. See "A Unified Theory of Garbage Collection" here:

http://www.research.ibm.com/people/d/dfb/publications.html

Scholar cluster:

http://scholar.google.com/scholar?q=bacon+unified+theory+gar...

Incidentally, it is not even remotely true that reference counting is more efficient than tracing GC in all cases.

kragen|15 years ago

> I've seen generational GC perform admirably, almost magically. As a lark, I've put infinite loops into such apps that do nothing but allocate new objects

While I agree that generational GC can perform spectacularly well, what you're describing is close to the case it's optimized for, not close to its worst case. The worst case is that you allocate lots and lots of small objects and then write a pointer to all of them into a tenured garbage object.

> I half expect someone to tell me that this already exists for Python.

Yes, that's how Python works, except that I don't know what you mean by "something like LINT but for the runtime reference graph."

lambda|15 years ago

> IDEA: If someone had an augmented ref-counting system with a runtime containing an optional cycle-detector and something like LINT but for the runtime reference graph, one would get most of the benefits of GC with the efficiency of the ref-counting system. I half expect someone to tell me that this already exists for Python.

Or you could do what Rust <https://github.com/graydon/rust/wiki/Language-FAQ>; does, and have different types of objects; stateful and stateless. Stateless object cannot have cycles (since you can't construct them in stateless objects, unless you're in a lazy language), and so reference counting can be used for stateless objects. Stateful objects, which can have cycles, are managed by the garbage collector.

Rust sounds like a really interesting idea, and I'd love to give it a try, but sadly it's still in the "some assembly required" stage as they work on bootstrapping it so the only real projects that are worthwhile to do with it yet are helping out with the bootstrapping process.

rayiner|15 years ago

> This is my problem with GC. I like simplicity. Simplicity tends to perform well, and being simple also means it has little space for problems. Refcounting is simple and elegant,

Reference counting isn't conceptually any more simple than garbage collection (you still have to indicate how the underlying alloc() and free() operations are implemented). Further, a proper high performance reference counting system isn't any simpler than a high performance GC. In a multithreaded system you have to keep the reference counts in sync without doing a slow atomic operation every time a pointer is passed around, and you need to back the ref counting system with a high performance malloc()/free() that can handle multithreaded allocation and is resistant to memory fragmentation.

jamii|15 years ago

For higher-level systems programming (ie not bare metal) I think the approach taken by the Rust language is interesting. Mutable data structures are thread-local and are GCed (GC happens independently per thread). Immutable data structures can be passed between threads, are reference counted and support RAII.

https://github.com/graydon/rust/wiki/Language-FAQ

revetkn|15 years ago

This is my problem with GC. I like simplicity. Simplicity tends to perform well, and being simple also means it has little space for problems

If your use case is teaching how to implement a garbage collector vs. a refcounting system, it is certainly much simpler to implement refcounting. If you are a systems programmer/writing a VM or compiler and do your work at a bare-metal level, then manual memory management/refcounting is probably your best choice.

For everyone else (let's guess over 90% of the programming population), using high-level GC'd languages is significantly simpler and more productive.

barrkel|15 years ago

Implementing a stop-the-world, 2-space copying GC, provided you have complete stack, register and type layout data, is trivial. Implementing reference counting is much less so, especially in the presence of threads. Many simple scenarios turn into problems of lock-free programming proportions - and that's just for verifying that memory safety is present, not that the user hasn't introduced bugs with their own threading cock-ups.

(Delphi uses reference counting for interfaces, strings and dynamic arrays, and I am aware of race bugs in strings in particular (which are copy on write); these bugs are hard to fix without murdering performance, yet in practice they are very rare on x86 memory model hardware. So they stay.)

stonemetal|15 years ago

Nothing in the definition of Garbage Collection says it must be a separate post processing step. Therefore reference counting is in fact a garbage collection mechanism. Just because it is an eager algorithm rather than the usual lazy one doesn't change what it does, collect garbage in the system.

marshray|15 years ago

Nothing in the definition of Garbage Collection says it must be a separate post processing step.

My impression (based on reading up on the state-of-the-art some years ago) was that an actual concurrent-with-normal-execution GC needed to resort to putting a write barrier in front of the useful thread of execution much of the time. This would cause a noticeable perf hit.

There were architectures designed with hardware support for this kind of thing, but chips without it left them in the dust. Whether or not that is an accident of history or for a reason is an interesting discussion.

yid|15 years ago

Right, and garbage collection is just lazy memory manamgement--you go see it either way, so this whole point is rather pedantic.

gruseom|15 years ago

I like simplicity. Simplicity tends to perform well, and being simple also means it has little space for problems.

Amen.

I saw an old interview with Chuck Moore a while ago in which he said: I like simplicity and efficiency. That struck me. How often do people put those two things together? We're conditioned to think of them as a tradeoff. But if you can have both, shouldn't we be trying hard for that? Which raises another question: what do you have to give up in exchange for both simplicity and efficiency?

chc|15 years ago

It's because "simple" can be a measure of different things. C is a pretty simple and straightforward language, but it takes 10 times as much code to do anything, which could be said to be quite a bit more complex.

kainosnoema|15 years ago

In my experience, it takes an incredible amount of effort and experience to solve a complex problem with simplicity and efficiency. It's well worth it, but there's definitely a cost, and not everyone is capable of it.

caf|15 years ago

I tend to feel that the two often go together. After all, "the key to making a program fast is to make it do as little as possible".

seanx|15 years ago

> I like simplicity. Simplicity tends to perform well, and being simple also means it has little space for problems.

I like simplicity in my code, I am far less concerned about simplicity in the underlying libraries that I use.

GC makes my code simpler and more understandable. I use manual memory allocation when I absolutely need to, and GC languages when I can. In most cases, that means I use a GC language.