top | item 44622954

Using uninitialized memory for fun and profit (2008)

34 points| AminZamani | 7 months ago |research.swtch.com

18 comments

order

jojomodding|7 months ago

Interestingly enough, Rust does not allow you to access undefined memory, not even if you do not care about the value stored there. People have been proposing a `freeze` operation that replaces uninitialized memory with garbage but initialized data (i.e. a no-op in assembly).

But there is tension about this: Not allowing access to uninitialized memory, ever, means that you get more guarantees about what foreign (safe) Rust can do, for instance.

thrance|7 months ago

True of safe rust only. You can always fall back to unsafe rust, allocate a chunk of bytes and write/read to it as you wish.

kmeisthax|7 months ago

My impression was that there was some kind of optimization in LLVM that relied on being able to assume values were never undef[0], which is why undefined memory access was always illegal in Rust[1].

Putting that aside, a deliberate "read uninitialized memory with bounded UB" primitive like freeze would only work for types where all possible bit patterns are valid. So no freezing chars[2], references, or sum types. And any transparent wrapper type that has invariants - like, say, slices, vecs, strs, and/or range-restricted integer types - would see them utterly broken when frozen. I suppose you could define some operation to "validate" the underlying bit pattern, but I'm not sure if that would defeat the point of reading uninitialized memory.

[0] LLVM concept that represents uninitialized memory, among other things.

[1] I believe a few other unsafe Rust concepts are actually leaky abstractions around LLVM things

[2] Rust's char must hold valid UTF-8 and will UB if you stick surrogates in there

dooglius|7 months ago

One thing worth pointing out is that Linux makes it pretty difficult for userspace to access uninitialized memory; the MAP_UNINITIALIZED flag for mmap has to be specifically configured but generally isn't, so the memory does get zeroed at some point. Best you can hope for is that your heap allocator re-uses some un-munmapped memory. The kernel will zero pages on-demand, which helps, but you're still paying a cost for that zeroing.

Sesse__|7 months ago

An elegant optimization, but how would you intersect two of these efficiently? It sounds like you'd need to iterate over the entire dense vector and do a sparse-vector check for each and every value (O(m) with a very high constant factor). Either that, or sort both sparse vectors (O(n log n)).

dzaima|7 months ago

Wouldn't it be just a trivial O(n) of a loop of "for (x in dense vector of one set) { if (is x a member of the other set) add to result; }"?

dooglius|7 months ago

Why would iterating over the dense vector be O(m) rather than O(n)?