top | item 39013194

Passing nothing is surprisingly difficult

180 points| kingkilr | 2 years ago |davidben.net

70 comments

order

kazinator|2 years ago

There is no problem with memcpy other than that you can't use a null pointer. You can memcpy zero bytes as long as the pointer is valid. This works in a good many circumstances; just not circumstances where the empty array is represented by not having an address at all.

For instance, say we write a function that rotates an array: it moves the low M bytes to the top of the array, and shuffles the remaining M - N bytes down to the bottom. This function will work fine with the zero byte memmove or memcpy operations in the special case when N == 0, because the pointer will be valid.

Now say we have something like this:

  struct buf {
    char *ptr;
    size_t size;
  };
we would like it so that when the size is zero, we don't have an allocated buffer there. But we'd like to support a zero sized memcpy in that case: memcpy(buf->ptr, whatever, 0) or in the other direction likewise.

We now have to check for buf->ptr being buf in the code that deals with resizing.

Here is a snag in the C language related to zero sized arrays. The call malloc(0) is allowed to return a null pointer, or a non-null pointer that can be passed to free.

oops! In the one case, the pointer may not be used with a zero-sized memcpy; in the other case it can.

This also goes for realloc(NULL, 0) which is equivalent to malloc(0).

And, OMG I just noticed ...

In C99, this was valid realloc(ptr, 0) where ptr is a valid, allocated pointer. You could realloc an object to zero.

I'm looking at the April 2023 draft (N3096). It states that realloc(ptr, 0) is undefined behavior.

When did that happen?

matheusmoreira|2 years ago

Is there a rationale for a memory allocator to support zero sized allocations? Is this really just about providing a "technically" valid pointer for the pointer/size pair structure? To me it seems any address is a potentially valid pointer to a zero-sized object. Do allocators really keep track of these null allocations? That would require keeping state for every single address in the worst case...

It's very strange. I wrote my own memory allocator and I can't figure out the right way to handle this. Eliminating the need for these "technically" valid pointers that can't actually be accessed because they're zero sized seems like the better solution.

> When did that happen?

More importantly, why did that happen? People have told me that I should care about the C standards committee because they take backwards compatibility very seriously. Then they come out with breaking changes like these.

kevingadd|2 years ago

A fun additional twist to this is that dereferencing nullptr is valid in WebAssembly, and actual data can in fact end up there, though ideally it never will.

If you ensure that the 'zero page' (so to speak) is empty you can also exploit this property for optimizations, and in some cases the emscripten toolchain will do so.

i.e. if you have

  struct MyArray<T> {
    uint length;
    T items[0];
  }
you can elide null pointer checks and just do a single direct bounds check before dereferencing an element, because for a nullptr, (&ptr->length) == nullptr, and if you reserve the zero page and keep it empty, (nullptr)->length == 0.

this complicates the idea of 'passing nothing' because now it is realistically possible for your code to get passed nullptr on purpose and it might be expected to behave correctly when that happens, instead of asserting or panicking like it would on other (sensible) targets

Joker_vD|2 years ago

Because WASM is not C and there is no "nullptr" in WASM. In WASM, zero is just an address, as valid as any other. And C actually doesn't require the null pointer value to have bit pattern "all zeros", precisely to allow for architectures where treating zero address as invalid would be way too cumbersome. And some implementations actually took that option.

nmilo|2 years ago

I'm not sure... wasm is an assembly, not a C implementation. It can define what happens if you load from 0 but it doesn't get to define if the C code `*nullptr` actually loads from 0. Whether or not it does is defined by your compiler, which is probably the clang frontend if you're on emscripten. But then again I think there's a clang flag to disable optimizing away reads/writes to nullptr.

lanstin|2 years ago

HPUX must have had something similar, as when AOL backend code was ported to Solaris, which does segv on null dereference, we found all kinds of places where code that had been running without notable incident on HPUX started dropping core.

vlovich123|2 years ago

I’m kind of surprised it’s not defined that the first page must be 0-mapped read only… this sounds like a security vulnerability because it’s not like any other machine code would be written against and thus violate all sorts of safety assumptions.

matheusmoreira|2 years ago

I'm dealing with the exact same issues right now in my project, this post is very enlightening.

> But suppose we want an empty (length zero) slice.

So is there an actual rationale for this? I've written the memory allocator and am in the process of developing the foreign interface. I've been wondering if I should explicitly support zero length allocations. Even asked this a few times here on HN but never got an answer. It seems to be a thing people sort of want but for unknown reasons.

anderskaseorg|2 years ago

It is extremely common to have a collection that might or might not be empty at runtime, and we don’t want to force every programmer who allocates a slice to manually write an alternate code path for the empty case.

swiftcoder|2 years ago

It's obviously too late to change this in Rust's case, but I wonder whether being able to differentiate between None and the empty slice is actually a necessary property in general?

There are a bunch of languages where empty arrays are "falsy", and in those it's not recommendable to use the two to differentiate valid states. Feels like the same could apply here

tialaramex|2 years ago

The main complaint in the post is basically that Rust's actual bona fide slice type doesn't work the way cobbled together library types for this purpose in C or C++ do.

The C++ type discussed is much newer than Rust (std::span was standardized in C++ 20).

Yes in many cases what C++ APIs mean here isn't a slice of zero Ts at all but instead None, and Rust has an appropriate type for that Option<&[T]> which works as expected, and so in many cases where people have built an API which they think is &[T] and are trying to make it with the unsafe functions mentioned it's actually Option<&[T]> they needed anyway, they don't even have a type correct design.

anonymoushn|2 years ago

Unfortunately empty slices are pretty useful, particularly for strings. For example, if you want to represent HTTP response headers, you might include a bunch of nigh-ubiquitous headers in a struct and punt the others to a hash table, and you would then have to represent both empty-valued-and-present headers and missing headers for those headers you placed in the struct.

cozzyd|2 years ago

I thought the justification for Rust having an unstable ABI is so such things like binary representations of types could change?

vardump|2 years ago

Fun times with buggy kernel drivers.

Pass something with a 0 length, pointing to NULL. Enjoy your blue screens and kernel panics.

pizlonator|2 years ago

It’s so silly to talk about C not allowing null on memcpy. That’s a thing the spec says, I guess?

The solution is clear: just ignore the C spec. It’s total garbage. Of course you can memcpy between any ptr values if the count is zero and those ptr values don’t have to point to anything.

JonChesterfield|2 years ago

Better be rolling your own compiler in that case. Or your own memcpy with a different name.

UB to pass memcpy to null means after that call, the pointer is assumed to be non-null. So if(ptr) can constant fold. Maybe faster.

I'm in agreement with you on this but your compiler probably isn't.

anonymoushn|2 years ago

This is a bad idea if your memcpy starts by adding len to src or dst, since nullptr + 0 is nasal demons in C.

SonOfLilit|2 years ago

What a wonderfully subtle issue.

pyrolistical|2 years ago

How does zig handle this? Does it just have its own slice representation that gets compiled away? Or does it disallow zero length slices?

anonymoushn|2 years ago

Zig slices are (start, count) where start's type is non-nullable pointer.

My impression is that Zig doesn't have a documented memory model that cares about things like whether an address corresponds to an allocation or not, so problems relating to this sort of thing cannot come up yet :)

stealthcat|2 years ago

In ML the problem is passing 0D scalar tensor as 1D 1-element tensor.

hackyhacky|2 years ago

> Passing nothing is surprisingly difficult

From the title, I assumed that this article was going to be about either (a) permissive grading standards at university or (b) chronic constipation.

dmvdoug|2 years ago

In fairness, both of those are also surprisingly difficult.