(no title)
MereInterest | 4 months ago
> if a machine word's integer value, when considered as a pointer, falls within a GCed block of memory, then that block itself is considered reachable (and is transitively scanned). Since a conservative GC cannot know if a word is really a pointer, or is a random sequence of bits that happens to be the same as a valid pointer, this over-approximates the live set
Suppose I allocate two blocks of memory, convert their pointers to integers, then store the values `x` and `x^y`. At this point, no machine word points to the second allocation, and so the GC would consider the second allocation to be unreachable. However, the value `y` could be computed as `x ^ (x^y)`, converted back to a pointer, and accessed. Therefore, their reachability analysis would under-approximate the live set.
If pointers and integers can be freely converted to each other, then the GC would need to consider not just the integers that currently exist, but also every integer that could be produced from the integers that currently exist.
IshKebab|4 months ago
You can only freely convert integers to pointers with "exposed provenance" in Rust which is currently unstable.
https://doc.rust-lang.org/std/ptr/index.html#exposed-provena...
I find the idea of provenance a bit abstract so it's a lot easier to think about a concrete pointer system that has "real" provenance: CHERI. In CHERI all pointers are capabilities with a "valid" tag bit (it's out-of-band so you can't just set it to 1 arbitrarily). As soon as you start doing raw bit manipulation of the address the tag is cleared and then it can be no longer used as a pointer. So this problem doesn't exist on CHERI.
Also the problem of mistaking integers as pointers when scanning doesn't exist either - you can instead just search for memory where the tag bit is set.
kmeisthax|4 months ago
What compiler writers realized is that pointers are actually not integers, even though we optimize them down to be integers. There's extra information in them we're forgetting to materialize in code, so-called "pointer provenance", that optimizers are implicitly using when they make certain obvious pointer optimizations. This would include the original block of memory or local variable you got the pointer from as well as the size of that data.
For normal pointer operations, including casting them to integers, this has no bearing on the meaning of the program. Pointers can lower to integers. But that doesn't mean constructing a new pointer from an integer alone is a sound operation. That is to say, in your example, recovering the integer portion of y and casting it to a pointer shouldn't be allowed.
There are two ways in which the casting of integers to pointers can be made a sound operation. The first would be to have the programmer provide a suitably valid pointer with the same or greater provenance as the one that provided the address. The other, which C/C++ went with for legacy reasons, is to say that pointers that are cast to integers become 'exposed' in such a way that casting the same integer back to a pointer successfully recovers the provenance.
If you're wondering, Rust supports both methods of sound int-to-pointer casts. The former is uninteresting for your example[0], but the latter would work. The way that 'exposed provenance' would lower to a GC system would be to have the GC keep a list of permanently rooted objects that have had their pointers cast to integers, and thus can never be collected by the system. Obviously, making pointer-to-integer casts leak every allocation they touch is a Very Bad Idea, but so is XORing pointers.
Ironically, if Alloy had done what other Rust GCs do - i.e. have a dedicated Collect trait - you could store x and x^y in a single newtype that transparently recovers y and tells the GC to traverse it. This is the sort of contrived scenario where insisting on API changes to provide a precise collector actually gets what a conservative collector would miss.
[0] If you're wondering what situations in which "cast from pointer and int to another pointer" would be necessary, consider how NaN-boxing or tagged pointers in JavaScript interpreters might be made sound.