There is no problem with memcpy other than that you can't use a null pointer. You can memcpy zero bytes as long as the pointer is valid. This works in a good many circumstances; just not circumstances where the empty array is represented by not having an address at all.
For instance, say we write a function that rotates an array: it moves the low M bytes to the top of the array, and shuffles the remaining M - N bytes down to the bottom. This function will work fine with the zero byte memmove or memcpy operations in the special case when N == 0, because the pointer will be valid.
Now say we have something like this:
struct buf {
char *ptr;
size_t size;
};
we would like it so that when the size is zero, we don't have an allocated buffer there. But we'd like to support a zero sized memcpy in that case: memcpy(buf->ptr, whatever, 0) or in the other direction likewise.
We now have to check for buf->ptr being buf in the code that deals with resizing.
Here is a snag in the C language related to zero sized arrays. The call malloc(0) is allowed to return a null pointer, or a non-null pointer that can be passed to free.
oops! In the one case, the pointer may not be used with a zero-sized memcpy; in the other case it can.
This also goes for realloc(NULL, 0) which is equivalent to malloc(0).
And, OMG I just noticed ...
In C99, this was valid realloc(ptr, 0) where ptr is a valid, allocated pointer. You could realloc an object to zero.
I'm looking at the April 2023 draft (N3096). It states that realloc(ptr, 0) is undefined behavior.
N2464 [0]: there was lots of implementation divergence on what realloc(ptr, 0) did (especially with BSD, which allegedly doesn't free the memory at all?), so they just declared it UB.
Is there a rationale for a memory allocator to support zero sized allocations? Is this really just about providing a "technically" valid pointer for the pointer/size pair structure? To me it seems any address is a potentially valid pointer to a zero-sized object. Do allocators really keep track of these null allocations? That would require keeping state for every single address in the worst case...
It's very strange. I wrote my own memory allocator and I can't figure out the right way to handle this. Eliminating the need for these "technically" valid pointers that can't actually be accessed because they're zero sized seems like the better solution.
> When did that happen?
More importantly, why did that happen? People have told me that I should care about the C standards committee because they take backwards compatibility very seriously. Then they come out with breaking changes like these.
This is basically the "define pointer arithmetic for invalid pointers". Which as pointed out in that section, doesn't solve completely the FFI problem.
A fun additional twist to this is that dereferencing nullptr is valid in WebAssembly, and actual data can in fact end up there, though ideally it never will.
If you ensure that the 'zero page' (so to speak) is empty you can also exploit this property for optimizations, and in some cases the emscripten toolchain will do so.
i.e. if you have
struct MyArray<T> {
uint length;
T items[0];
}
you can elide null pointer checks and just do a single direct bounds check before dereferencing an element, because for a nullptr, (&ptr->length) == nullptr, and if you reserve the zero page and keep it empty, (nullptr)->length == 0.
this complicates the idea of 'passing nothing' because now it is realistically possible for your code to get passed nullptr on purpose and it might be expected to behave correctly when that happens, instead of asserting or panicking like it would on other (sensible) targets
Because WASM is not C and there is no "nullptr" in WASM. In WASM, zero is just an address, as valid as any other. And C actually doesn't require the null pointer value to have bit pattern "all zeros", precisely to allow for architectures where treating zero address as invalid would be way too cumbersome. And some implementations actually took that option.
I'm not sure... wasm is an assembly, not a C implementation. It can define what happens if you load from 0 but it doesn't get to define if the C code `*nullptr` actually loads from 0. Whether or not it does is defined by your compiler, which is probably the clang frontend if you're on emscripten. But then again I think there's a clang flag to disable optimizing away reads/writes to nullptr.
HPUX must have had something similar, as when AOL backend code was ported to Solaris, which does segv on null dereference, we found all kinds of places where code that had been running without notable incident on HPUX started dropping core.
I’m kind of surprised it’s not defined that the first page must be 0-mapped read only… this sounds like a security vulnerability because it’s not like any other machine code would be written against and thus violate all sorts of safety assumptions.
I'm dealing with the exact same issues right now in my project, this post is very enlightening.
> But suppose we want an empty (length zero) slice.
So is there an actual rationale for this? I've written the memory allocator and am in the process of developing the foreign interface. I've been wondering if I should explicitly support zero length allocations. Even asked this a few times here on HN but never got an answer. It seems to be a thing people sort of want but for unknown reasons.
It is extremely common to have a collection that might or might not be empty at runtime, and we don’t want to force every programmer who allocates a slice to manually write an alternate code path for the empty case.
It's obviously too late to change this in Rust's case, but I wonder whether being able to differentiate between None and the empty slice is actually a necessary property in general?
There are a bunch of languages where empty arrays are "falsy", and in those it's not recommendable to use the two to differentiate valid states. Feels like the same could apply here
The main complaint in the post is basically that Rust's actual bona fide slice type doesn't work the way cobbled together library types for this purpose in C or C++ do.
The C++ type discussed is much newer than Rust (std::span was standardized in C++ 20).
Yes in many cases what C++ APIs mean here isn't a slice of zero Ts at all but instead None, and Rust has an appropriate type for that Option<&[T]> which works as expected, and so in many cases where people have built an API which they think is &[T] and are trying to make it with the unsafe functions mentioned it's actually Option<&[T]> they needed anyway, they don't even have a type correct design.
Unfortunately empty slices are pretty useful, particularly for strings. For example, if you want to represent HTTP response headers, you might include a bunch of nigh-ubiquitous headers in a struct and punt the others to a hash table, and you would then have to represent both empty-valued-and-present headers and missing headers for those headers you placed in the struct.
It’s so silly to talk about C not allowing null on memcpy. That’s a thing the spec says, I guess?
The solution is clear: just ignore the C spec. It’s total garbage. Of course you can memcpy between any ptr values if the count is zero and those ptr values don’t have to point to anything.
Zig slices are (start, count) where start's type is non-nullable pointer.
My impression is that Zig doesn't have a documented memory model that cares about things like whether an address corresponds to an allocation or not, so problems relating to this sort of thing cannot come up yet :)
kazinator|2 years ago
For instance, say we write a function that rotates an array: it moves the low M bytes to the top of the array, and shuffles the remaining M - N bytes down to the bottom. This function will work fine with the zero byte memmove or memcpy operations in the special case when N == 0, because the pointer will be valid.
Now say we have something like this:
we would like it so that when the size is zero, we don't have an allocated buffer there. But we'd like to support a zero sized memcpy in that case: memcpy(buf->ptr, whatever, 0) or in the other direction likewise.We now have to check for buf->ptr being buf in the code that deals with resizing.
Here is a snag in the C language related to zero sized arrays. The call malloc(0) is allowed to return a null pointer, or a non-null pointer that can be passed to free.
oops! In the one case, the pointer may not be used with a zero-sized memcpy; in the other case it can.
This also goes for realloc(NULL, 0) which is equivalent to malloc(0).
And, OMG I just noticed ...
In C99, this was valid realloc(ptr, 0) where ptr is a valid, allocated pointer. You could realloc an object to zero.
I'm looking at the April 2023 draft (N3096). It states that realloc(ptr, 0) is undefined behavior.
When did that happen?
LegionMammal978|2 years ago
[0] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2464.pdf
unknown|2 years ago
[deleted]
matheusmoreira|2 years ago
It's very strange. I wrote my own memory allocator and I can't figure out the right way to handle this. Eliminating the need for these "technically" valid pointers that can't actually be accessed because they're zero sized seems like the better solution.
> When did that happen?
More importantly, why did that happen? People have told me that I should care about the C standards committee because they take backwards compatibility very seriously. Then they come out with breaking changes like these.
cbarrick|2 years ago
[1]: https://github.com/rust-lang/unsafe-code-guidelines/issues/4...
steveklabnik|2 years ago
thayne|2 years ago
kevingadd|2 years ago
If you ensure that the 'zero page' (so to speak) is empty you can also exploit this property for optimizations, and in some cases the emscripten toolchain will do so.
i.e. if you have
you can elide null pointer checks and just do a single direct bounds check before dereferencing an element, because for a nullptr, (&ptr->length) == nullptr, and if you reserve the zero page and keep it empty, (nullptr)->length == 0.this complicates the idea of 'passing nothing' because now it is realistically possible for your code to get passed nullptr on purpose and it might be expected to behave correctly when that happens, instead of asserting or panicking like it would on other (sensible) targets
Joker_vD|2 years ago
nmilo|2 years ago
lanstin|2 years ago
vlovich123|2 years ago
unknown|2 years ago
[deleted]
matheusmoreira|2 years ago
> But suppose we want an empty (length zero) slice.
So is there an actual rationale for this? I've written the memory allocator and am in the process of developing the foreign interface. I've been wondering if I should explicitly support zero length allocations. Even asked this a few times here on HN but never got an answer. It seems to be a thing people sort of want but for unknown reasons.
anderskaseorg|2 years ago
swiftcoder|2 years ago
There are a bunch of languages where empty arrays are "falsy", and in those it's not recommendable to use the two to differentiate valid states. Feels like the same could apply here
tialaramex|2 years ago
The C++ type discussed is much newer than Rust (std::span was standardized in C++ 20).
Yes in many cases what C++ APIs mean here isn't a slice of zero Ts at all but instead None, and Rust has an appropriate type for that Option<&[T]> which works as expected, and so in many cases where people have built an API which they think is &[T] and are trying to make it with the unsafe functions mentioned it's actually Option<&[T]> they needed anyway, they don't even have a type correct design.
anonymoushn|2 years ago
cozzyd|2 years ago
vardump|2 years ago
Pass something with a 0 length, pointing to NULL. Enjoy your blue screens and kernel panics.
pizlonator|2 years ago
The solution is clear: just ignore the C spec. It’s total garbage. Of course you can memcpy between any ptr values if the count is zero and those ptr values don’t have to point to anything.
JonChesterfield|2 years ago
UB to pass memcpy to null means after that call, the pointer is assumed to be non-null. So if(ptr) can constant fold. Maybe faster.
I'm in agreement with you on this but your compiler probably isn't.
anonymoushn|2 years ago
SonOfLilit|2 years ago
pyrolistical|2 years ago
anonymoushn|2 years ago
My impression is that Zig doesn't have a documented memory model that cares about things like whether an address corresponds to an allocation or not, so problems relating to this sort of thing cannot come up yet :)
brabel|2 years ago
https://github.com/ziglang/zig/commit/32e0dfd4f0dab351a024e7...
stealthcat|2 years ago
bhakunikaran|2 years ago
hackyhacky|2 years ago
From the title, I assumed that this article was going to be about either (a) permissive grading standards at university or (b) chronic constipation.
dmvdoug|2 years ago