top | item 45839014

(no title)

ragnese | 3 months ago

It's been a long time since I've done C/C++, but I'm not sure what you're saying with regard to provenance. I was pretty sure that you were able to cast an arbitrary integer value into a pointer, and it really didn't have to "come from" anywhere. So, all I'm saying is that, under-the-hood, a C pointer really is just an integer. Saying that a pointer means something beyond the bits that make up the value is no more relevant than saying a bool means something other than its integer value, which is also true.

discuss

order

tialaramex|3 months ago

Start your journey here: https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm

That's defect report #260 against the C language. One option for WG14 was to say "Oops, yeah, that should work in our language" and then modern C compilers have to be modified and many C programs are significantly slower. This gets the world you (and you're far from alone among C programmers) thought you lived in, though probably your C programs are slower than you expected now 'cos your "pointers" are now just address integers and you've never thought about the full consequences.

But they didn't, instead they wrote "They may also treat pointers based on different origins as distinct even though they are bitwise identical" because by then that is in fact how C compilers work. That's them saying pointers have provenance, though they do not describe (and neither does their ISO document) how that works.

There is currently a TR (I think, maybe wrong letters) which explains PNVI-ae-udi, Provenance Not Via Integers, Addresses Exposed, User Disambiguates which is the current preferred model for how this could possibly work. Compilers don't implement that properly either, but they could in principle so that's why it is seen as a reasonable goal for the C language. That TR is not part of the ISO standard but in principle one day it could be. Until then, provenance is just a vague shrug in C. What you said is wrong, and er... yeah that is awkward for everybody.

Rust does specify how this works. But the bad news (I guess?) for you is that it too says that provenance is a thing, so you cannot just go around claiming the address you dredged up from who knows where is a valid pointer, it ain't. Or rather, in Rust you can write out explicitly that you do want to do this, but the specification is clear that you get a pointer but not necessarily a valid pointer even if you expected otherwise.

adgjlsfhk1|3 months ago

it is kind of fun to me that most C programmers believe that ISO C exists/is implemented anywhere, when in reality we have a bunch of compilers that claim to compile iso C, but just have a lot of random basically unfixable behavior.

Asooka|3 months ago

I do believe the C standards committee got it completely backwards with regards to undefined behaviour optimisations. By default the language should act in a way that a human can reason about, in particular it should not be more complicated than assembly. Then, they can add some mechanism for decorating select hot program blocks as amenable to certain optimisations[1]. In the majority of the program the optimisation of not writing a single machine word to memory before calling memcmp is not measurable. The saddest part is that other languages like Rust and Zig have picked all this up like cargo cult language design. Writing code is already complicated enough without having to watch out for pitfalls added so the compiler can achieve one nanosecond faster time on SPECint.

[1] As an aside, the last time I tried to talk to a committee representative about undefined behaviour optimisation pitfalls, I was told that the standard does not prescribe optimisations. Which was quite puzzling, because it obviously prescribes compiler behaviour with the express goal of allowing certain optimisations. If I took that statement at face value, it would follow that undefined behaviour is not there for optimisation's sake, but rather as a fun feature to make programming more interesting...

1718627440|3 months ago

A pointer derived from one allocation and a pointer from another allocation does not need to compare the same, even if the address is. And you will likely invoke UB. This is because C tries to be portable and also support segmented storage.

A pointer derived from only an integer was supposed to alias any allocation, but good luck discussing this with your optimizing C compiler.