Safepoints and Fil-C

[+] cyberax|6 months ago|reply

Polling for the safepoint signal adds overhead to tight loop, so Golang uses asynchronous signals to interrupt threads. It needs it for async pre-emption, not just GC. It also results in not knowing the exact state of the frame, so it has to scan the last frame of the stack conservatively.

There are no good ways to deal with that currently. Some VMs even used CPU instruction simulators to interpret the code until it hit the known good safepoint.

I had one idea about improving that: have a "doppelganger" mirror for the inner tight loops, that is instruction-by-instruction identical to the regular version. Except that backjumps are replaced with the call into the GC safepoint handler.

It'd be interesting to try this approach with Fil-C.

[+] pizlonator|6 months ago|reply

> Polling for the safepoint signal adds overhead to tight loop

The usual trick, which Fil-C doesn't do yet, is to just unroll super tight loops

Also, the pollcheck doesn't have to be a load-and-branch. There are other ways. There's a bottomless pit of optimization strategies you can do.

Conservative scanning wouldn't be sound in Fil-C, because the LLVM passes that run after FilPizlonator could break the fundamental assumptions of conservative scanning

[+] cryptonector|6 months ago|reply

> Fil-C's pollchecks also support stop-the-world (via the FILC_THREAD_STATE_STOP_REQUESTED bit). This is used for:

> - Implementing fork(2), which needs all threads to stop at a known-good point before the fork child jettisons them.

Makes me wonder how it handles `vfork()`, but I think it needs just a safepoint and no stop-the-world since, after all, the child side of `vfork()` is executing in the same address space as the parent (until exec-or-exit), so it's as though it's the same thread as in the parent (which is stopped waiting for the child to exec-or-exit). All the more reasons that `fork()` is evil and `vfork()` much better.

[+] pizlonator|6 months ago|reply

I don't support vfork(2) right now, only fork(2).

I have some super crazy ideas about how to make vfork(2) work, but I don't particularly like any of them. I'll implement it if I find some situation where vfork(2) is strongly required (so far I haven't found even one such case). The closest is that I've found cases where I have to add the CLOEXEC pipe trick to do error handling the fork(2) way rather than the vfork(2) way.

[+] foota|6 months ago|reply

I was looking at https://fil-c.org/programs_that_work and saw "dash 0.5.12. One tiny change: use fork(2) instead of vfork(2).", so maybe it doesn't :)

[+] cryptonector|6 months ago|reply

@pizlonator I wonder if you couldn't bracket all assembly with `filc_exit`/`filc_enter` as you do system calls. When you know the assembly doesn't allocate memory then this should work fine but.. ah, it's the stack allocations that are also interesting, right? So you'd have to ensure that the assembly has enough stack space to execute _and_ that it does not invoke the heap allocator. But that seems doable for things like cryptography in in OpenSSL's libcrypto.

[+] pizlonator|6 months ago|reply

Yeah you could do that. It's tantamount to having an FFI. It's super risky though - lots of ways that the assembly code could break Fil-C's assumptions and then you'd get memory safety issues.

My long term plan is to give folks a way to write memory safe assembly. I have some crazy ideas for how to do that

[+] correct_horse|6 months ago|reply

Fil-C seems interesting, and I didn’t understand the details of how multi-threaded garbage collectors worked before reading it (I still don’t but I’m closer!). The tradeoff between a compacting garbage collector (Java) vs what you can bolt on to C without forking LLVM is particularly interesting.

[+] pjmlp|6 months ago|reply

Note that it is very simplistic to say Java has a compacting garbage collector, between Oracle JDK, OpenJDK, Temurin, Azul, PTC, Aicas, microEJ, OpenJ9, GraalVM, Jikes RVM, and the cousin Android, there is pleothora of GC configurations to chose from, that go beyond that.

[+] cryptonector|6 months ago|reply

TFA is really good at explaining how a threaded GC works with minimal impact when it's not running.

[+] kragen|6 months ago|reply

This is interesting and looks highly readable! But I don't understand it yet.

44 comments