(no title)
jfr | 15 years ago
To be a little pedantic on the subject, such a system (reference counting and immediate freeing) is a form of automatic memory management, but it is not GC in any way. Garbage collection implies that the system leaves garbage around, which needs to be collected in some way or another. The usual approach to refcounting releases resources as soon as they are no longer required (either by free()ing immediately or by sending it to a pool of unused resources), thus doesn't leave garbage around, and doesn't need a collector thread or mechanism to.
There are partial-GC implementations of refcounting, either because items are not free()d when they reach zero references, or to automatically detect reference loops which are not handled directly.
I agree with Torvalds on this matter. GC as it is promoted today is a giant step that gives programmers one benefit, solving one problem, while introducing a immeasurable pile of complexity to the system creating another pile of problems that are still not fixed today. And to fix some of these problems (like speed) you have to introduce more complexity.
This is my problem with GC. I like simplicity. Simplicity tends to perform well, and being simple also means it has little space for problems. Refcounting is simple and elegant, you just have to take care of reference loops, which also has another simple solution, that is weak references. I can teach a class of CS students everything they need to know to design a refcounting resource management system in one lesson.
GC is the opposite: it is big, complex, and a problem that the more you try to fix it, the more complex it becomes. The original idea is simple, but nobody uses the original idea because it performs so badly. To teach the same class how to design a GC system that performs as well as we expect today, an entire semester may not be enough.
stcredzero|15 years ago
In a way, I do as well.
GC as it is promoted today is a giant step that gives programmers one benefit, solving one problem, while introducing a immeasurable pile of complexity to the system creating another pile of problems that are still not fixed today. And to fix some of these problems (like speed) you have to introduce more complexity.
There are plenty of contexts where speed is a non-issue. In those cases, GC has been a huge win. The conceptual simplicity is the important part. The cost of the resources that would be saved with explicit and optimized memory management would be far outweighed by the resources required to implement such things.
The original idea is simple, but nobody uses the original idea because it performs so badly.
This is simply not true.
In the context of IO-bound enterprise systems, I've seen generational GC perform admirably, almost magically. As a lark, I've put infinite loops into such apps that do nothing but allocate new objects, and unless you are doing an exceptionally intense operation, you couldn't tell the difference. Properly tuned generational GC can be a truly fantastic seeming thing!
However, I will agree that the concerns Linus highlights are real, and that refcounting systems, like the one in iOS are by far better choices in many contexts.
EDIT: The above system I victimized, I only victimized in the TEST environment, but it was populated with something like 2-week old production data. The application in question is a traditional client/server desktop app used by a major energy company and had 800 active users at the time, handling millions in transactions every minute.
IDEA: If someone had an augmented ref-counting system with a runtime containing an optional cycle-detector and something like LINT but for the runtime reference graph, one would get most of the benefits of GC with the efficiency of the ref-counting system. I half expect someone to tell me that this already exists for Python.
abstractfactory|15 years ago
http://www.research.ibm.com/people/d/dfb/publications.html
Scholar cluster:
http://scholar.google.com/scholar?q=bacon+unified+theory+gar...
Incidentally, it is not even remotely true that reference counting is more efficient than tracing GC in all cases.
kragen|15 years ago
While I agree that generational GC can perform spectacularly well, what you're describing is close to the case it's optimized for, not close to its worst case. The worst case is that you allocate lots and lots of small objects and then write a pointer to all of them into a tenured garbage object.
> I half expect someone to tell me that this already exists for Python.
Yes, that's how Python works, except that I don't know what you mean by "something like LINT but for the runtime reference graph."
lambda|15 years ago
Or you could do what Rust <https://github.com/graydon/rust/wiki/Language-FAQ>; does, and have different types of objects; stateful and stateless. Stateless object cannot have cycles (since you can't construct them in stateless objects, unless you're in a lazy language), and so reference counting can be used for stateless objects. Stateful objects, which can have cycles, are managed by the garbage collector.
Rust sounds like a really interesting idea, and I'd love to give it a try, but sadly it's still in the "some assembly required" stage as they work on bootstrapping it so the only real projects that are worthwhile to do with it yet are helping out with the bootstrapping process.
rayiner|15 years ago
Reference counting isn't conceptually any more simple than garbage collection (you still have to indicate how the underlying alloc() and free() operations are implemented). Further, a proper high performance reference counting system isn't any simpler than a high performance GC. In a multithreaded system you have to keep the reference counts in sync without doing a slow atomic operation every time a pointer is passed around, and you need to back the ref counting system with a high performance malloc()/free() that can handle multithreaded allocation and is resistant to memory fragmentation.
jamii|15 years ago
https://github.com/graydon/rust/wiki/Language-FAQ
revetkn|15 years ago
If your use case is teaching how to implement a garbage collector vs. a refcounting system, it is certainly much simpler to implement refcounting. If you are a systems programmer/writing a VM or compiler and do your work at a bare-metal level, then manual memory management/refcounting is probably your best choice.
For everyone else (let's guess over 90% of the programming population), using high-level GC'd languages is significantly simpler and more productive.
barrkel|15 years ago
(Delphi uses reference counting for interfaces, strings and dynamic arrays, and I am aware of race bugs in strings in particular (which are copy on write); these bugs are hard to fix without murdering performance, yet in practice they are very rare on x86 memory model hardware. So they stay.)
stonemetal|15 years ago
marshray|15 years ago
My impression (based on reading up on the state-of-the-art some years ago) was that an actual concurrent-with-normal-execution GC needed to resort to putting a write barrier in front of the useful thread of execution much of the time. This would cause a noticeable perf hit.
There were architectures designed with hardware support for this kind of thing, but chips without it left them in the dust. Whether or not that is an accident of history or for a reason is an interesting discussion.
yid|15 years ago
gruseom|15 years ago
Amen.
I saw an old interview with Chuck Moore a while ago in which he said: I like simplicity and efficiency. That struck me. How often do people put those two things together? We're conditioned to think of them as a tradeoff. But if you can have both, shouldn't we be trying hard for that? Which raises another question: what do you have to give up in exchange for both simplicity and efficiency?
chc|15 years ago
kainosnoema|15 years ago
caf|15 years ago
seanx|15 years ago
I like simplicity in my code, I am far less concerned about simplicity in the underlying libraries that I use.
GC makes my code simpler and more understandable. I use manual memory allocation when I absolutely need to, and GC languages when I can. In most cases, that means I use a GC language.