top | item 34357855

(no title)

drmeister | 3 years ago

Thanks! I'm very interested in a new garbage collector that works with C++. I have a long list of requirements we need that is currently satisfied by the Boehm garbage collector and were satisfied by the Memory Pool System before we moved away from it. I've been looking at the Memory Management Toolkit (MMTk). Where would you put Oil in that small club of memory managers?

discuss

aidenn0|3 years ago

I've been following along the gc of oil.

It succeeds because it only solves the GC problems that Oil has (and I mean this as a high complement).

Oil is single threaded, code runs in loops that are well understood, and the code is generated from a strongly typed subset of Python.

The GC then is run only between loop iterations, so there is no need for stack scanning. You never have to worry about a root being in a temporary (from the point of view of the C++ compiler) since there are just a few locals in the function running the loop, and any variables local to the loop are logically dead between iterations.

Since a goal of Oil is portability, not scanning registers and stacks is very important. Getting this right when the GC could be invoked at any allocation is potentially intractable with mypy semantics at least.

drmeister|3 years ago

Ah - thank you. We are looking for a GC that supports (in no particular order): (1) conservative stack scanning, object pinning. (2) optionally conservative and precise on the heap. (3) compacting when precise. (3) weak pointers. (4) good multithreaded performance. (5) finalizers. (6) ability to create pools of different kinds of objects - especially cons cells (pairs).

pebal|3 years ago

You can look at this one: https://github.com/pebal/sgcl

chubot|3 years ago

So I'd say Oil's collector is highly unusual and not applicable most problems! (I now think that "every GC is a snowflake" -- it's such a multi-dimensional design space)

It's unusual because it's a precise collector in C++, and what I slowly realized is that that problem is basically impossible for any non-trivial software, without changing the C++ language itself :)

It seems like that hasn't happened, despite efforts over decades. I added this link about C++ GC support to the appendix, which also explains our unique constraints.

Garbage collection in the next C++ standard (Boehm 2009)

https://dl.acm.org/doi/abs/10.1145/1542431.1542437

http://www.oilshell.org/blog/2023/01/garbage-collector.html#...

---

The reason that precise GC can work for Oil is because it's a shell that links with extremely little 3rd-party code, and has relatively low perf requirements. We depend on libc and GNU readline, just like bash. And those libraries are basically old-school C functions which are easy to wrap with a GC.

(Also as Aidenn mentioned, shells use process-based concurrency, which means we don't have threads. The fact that it's mostly generated C++ code is also important, as mentioned in the post)

---

The funny thing is that one reason I started this project is because I worked with "big data" frameworks on clusters, but I found that you can do a lot on a single machine. (in spirit similar to the recent "Twitter on one machine post" https://news.ycombinator.com/item?id=34291191 )

I would just use shell scripts to saturate ~64 cores / 128 G of RAM, rather than dealing with slow schedulers and distributed file systems.

But garbage collectors and memory management are a main reason you can't use all of a machine from one process. There's just so much room for contention. Also the hardware was trending toward NUMA at the time, and probably is even more now, so processes make even more sense.

All of that is to say that I'm a little scared of multi-threaded GC ... especially when linking in lots of third party libraries.

And AFAIK heaps with tens or hundreds of gigabytes are still in the "not practical" range ... or they would take a huge amount of engineering effort

---

But of course there are many domains where you don't have embarrassingly parallel problems, and writing tight single- or multi-threaded code is the best solution.

Some more color here: https://old.reddit.com/r/oilshell/comments/109t7os/pictures_...

I wonder if Clasp has any support for multi-process programming? Beyond Unix pipes, you could also use shared memory and maybe some semaphores to synchronize access, and avoid copying. I think of that as sort of "inverting" the problem. Certain kinds of data like pointer-rich data is probably annoying to deal with in shared memory, but there are lots of representations for data and I imagine Lisps could take advantage of some of them, e.g. https://github.com/oilshell/oil/wiki/Compact-AST-Representat...

drmeister|3 years ago

Thank you. We do precise GC in C++. I wrote a C++ static analyzer in Lisp that uses the Clang front end and analyzes all of our C++ code and generates maps of GC-managed pointers in all classes. We precisely update pointers in thousands of classes that way. We also use it to save the system's state to a file or relinked executable so we can start up quickly later. Startup times using that are under 2 seconds on a reasonable CPU.