top | item 33714692

(no title)

sillycross | 3 years ago

> More recently I have been experimenting with a new calling convention that uses no callee-saved registers to work around this, but the results so far are inconclusive. The new calling convention would use all registers for arguments, but allocate registers in the opposite order of normal functions, to reduce the chance of overlap. I have been calling this calling convention "reverse_cc".

As explained in the article, LLVM already has the calling convention you are exactly looking for: the GHC convention (cc 10). You can use it to "pin" registers by passing arguments at the right spot. If you pin your argument in a callee-saved register of the C calling conv, it won't get clobbered after you do a C call.

discuss

order

haberman|3 years ago

I tried exposing the "cc 10" and "cc 11" calling conventions to Clang, but was unsuccessful. With "cc 10" I got crashes at runtime I could not explain. With "cc 11" I got a compile-time error:

> fatal error: error in backend: Can't generate HiPE prologue without runtime parameters

If "cc 10" would work from Clang I'd be happy to use that. Though I think reverse_cc could potentially still offer benefits by ordering arguments such that callee-save registers are assigned first. That is helpful when calling fallback functions that take arguments in the normal registers.

obl|3 years ago

If you didn't already, I'd recommend compiling llvm/clang in debug or release+assert mode when working on it. The codebase is quite heavy in debug-only assertions even for relatively trivial things (like missing implementation of some case, some combination of arguments being invalid, ...) which means that it's pretty easy to get into weird crashes down the line with assertions disabled.

sillycross|3 years ago

Yeah, you should do it at LLVM IR level.

JonChesterfield|3 years ago

Adding calling conventions to clang is fairly straightforward (e.g. https://reviews.llvm.org/D125970) but there's a limited size structure in clang somewhere that makes doing so slightly contentious. Partly because they're shared across all architectures. At some point we'll run out of bits and have to do something invasive and/or slow to free up space.

JonChesterfield|3 years ago

I think I remember one called preserve_none. That might have been in a downstream fork though.

The calling convention you want for a coroutine switch is caller saves everything live, callee saves nothing. As that means spilling is done on the live set, to the stack in use before switching. Return after stack switch then restores them.

So if a new convention is proposed that solves both fast stackful coroutines and bytecode interpreter overhead, that seems reasonable to me.