top | item 44048063

(no title)

electrograv | 9 months ago

> That's roughly how it works in C, and I know that it's also UB there if you do it wrong, but one thing is different: It doesn't really ever occupy my mind as a problem. In Rust it does.

UB doesn’t occupy the author’s mind when writing C, when it really should. This kind of lazy attitude to memory safety is precisely why so much C code is notoriously riddled with memory bugs and security vulnerabilities.

discuss

order

mk12|9 months ago

There is an important difference for this case though. It C it’s fine to have pointers into uninitialized memory as you as you don’t read them until after initializing. You can write through those pointers the same way you always do. In Rust it’s UB as soon as you “produce” an invalid value, which includes references to uninitialized memory. Everything uses references in Rust but when dealing with uninitialized memory you have to scrupulously avoid them, and instead write through raw pointers. This means you can’t reuse any code that writes through &mut. Also, the rules change over time. At one point I had unsafe code that had a Vec of uninitialized elements, which was ok because I never produced a reference to any element until after I had written them (through raw pointers). But they later changed the Vec docs to say that’s UB, I guess because they want to reserve the right to use references even if you never call a method that returns a reference.

Arnavion|9 months ago

This stopped being much of a problem when MaybeUninit was stabilized. Now you can stick to using &MaybeUninit<T> / &mut MaybeUninit<T> instead of needing to juggle *T / *mut T and carefully track converting that to &T / &mut T only when it's known to be initialized, and you can't accidentally use a MaybeUninit<T> where you meant to use a T because the types are different.

It's not as painless as it could be though, because many of the MaybeUninit<T> -> T conversion fns are unstable. Eg the code in TFA needs `&mut [MaybeUninit<T>] -> &mut [T]` but `[T]::assume_init_mut()` is unstable. But reimplementing them is just a matter of copying the libstd impl, that in turn is usually just a straightforward reinterpret-cast one-liner.

codeflo|9 months ago

I don’t get the difference. In both C and Rust you can have pointers to uninitialized memory. In both languages, you can’t use them except in very specific circumstances (which are AFAIK identical).

There are two actual differences in this regard: C pointers are more ergonomic than Rust pointers. And Rust has an additional feature called references, which enable a lot more aggressive compiler optimizations, but which have the restriction that you can’t have a reference to uninitialized memory.

nemothekid|9 months ago

Bizarre. I think I've been writing broken Rust code for a couple years. If I understand you correctly something like:

    let mut data = Vec::with_capacity(sz);
    unsafe { data.set_len(sz) };
    buf.copy_to_slice(data.as_mut_slice());
is UB?

uecker|9 months ago

It is also not UB to read uninitialized values through a pointer in C for types that do not have non-value representations.

usefulcat|9 months ago

I suspect that the main reason it doesn't really occupy the author's mind is that even though it's possible to misuse read(), it's really not that hard to actually use it safely.

It sounds like the more difficult problem here has to do with explaining to the compiler that read() is not being used unsafely.

o11c|9 months ago

The reason this particular UB doesn't need mindspace for C programmers is because it's not even meaningful to do anything with the parts of the buffer beyond the written length.

Most other UBs related to datums that you think you can do something with.

lhecker|9 months ago

What I meant is that if I write a UTF8 --> UTF16 conversion function for my editor in C I can write

  size_t convert(state_t* state, const void* inp, void* out)
This function now works with both initialized and uninitialized data in practice. It also is transparent over whether the output buffer is an `u8` (a byte buffer to write it out into a `File`) or `u16` (a buffer for then using the UTF16). I've never had to think about whether this doesn't work (in this particular context; let's ignore any alignment concerns for writes into `out` in this example) and I don't recall running into any issues writing such code in a long long time.

If I write the equivalent code in Rust I may write

  fn convert(&mut self, inp: &[u8], out: &mut [MaybeUninit<u8>]) -> usize
The problem is now obvious to me, but at least my intention is clear: "Come here! Give me your uninitialized arrays! I don't care!". But this is not the end of the problem, because writing this code is theoretically unsafe. If you have a `[u8]` slice for `out` you have to convert it to `[MaybeUninit<u8>]`, but then the function could theoretically write uninitialized data and that's UB isn't it? So now I have to think about this problem and write this instead:

  fn convert(&mut self, inp: &[u8], out: &mut [u8]) -> usize
...and that will also be unsafe, because now I have to convert my actual `[MaybeUninit<u8>]` buffer (for file writes) to `[u8]` for calls to this API.

Long story short, this is a problem that occupies my mind when writing in Rust, but not in C. That doesn't mean that C's many unsafeties don't worry me, it just means that this _particular_ problem type described above doesn't come up as an issue in C code that I write.

Edit: Also, what usefulcat said.

ninkendo|9 months ago

Why wouldn’t you accept a &mut [MaybeUninit<T>] and return a &mut [u8], hiding the unsafe bits that transmute the underlying reference?

Something like:

  fn convert<'i, 'o>(inp: &'i [u8], buf: &'o mut MaybeUninit<u8>) -> &'o mut [u8]
(Honest question, actually… because the above may be impossible to write and I’m on my phone and can’t try it.)

Edit: it works: https://play.rust-lang.org/?version=stable&mode=debug&editio...