> Naïve overflow checks, which are often security-critical, often get eliminated by compilers. This leads to exploitable code when the intent was clearly not to and the code, while naïve, was correctly performing security checks for two’s complement integers.
This is the most critical aspect. We have enough trouble already without the compiler actually fighting against security because this would fail in a machine from the 70s
Why are you assuming "machine from the 70s"? I know modern processors (DSPs) that need to saturate on integer arithmetic in order to maintain correctness. If you want your naive overflow checks to work on your x86 project, why not just use a compiler option like -fwrapv?
IMO, this is just plain ignorance - people arguing against the standard, while believing only their favorite platform is significant. C code is still big in embedded, you can't just trash the standard like that.
Isn't the quoted part 180 degrees wrong? Such code was not "correctly performing security checks", since it was undefined behaviour - two's complement or not. Which was the whole problem.
The main change between [P0907r0] and the subsequent revision is to maintain undefined behavior when signed integer overflow occurs, instead of defining wrapping behavior.
> Overflow in the positive direction shall wrap around
This appears to be defining signed integer overflow semantics, which prevents the compiler from doing certain basic optimizations, for example, that (x*2)/2 == x. Is that part of this? Has anyone measured the perf cost on a real program?
It prevents risky optimisations; this now requires the compiler to prove that such optimisations won't change the semantics of the code, e.g. in your case by essentially proving that the high 2 bits of x (only 1 in the unsigned case, due to sign-extension) will never be set.
...and it could be argued that if the compiler couldn't prove that was true, then it just helped you find a possible overflow bug in the code. If you actually wanted the compiler to unconditionally assume (x*2)/2==x and optimise accordingly, then you'd have to tell it such; e.g. MSVC has the __analysis_assume() feature.
The link is revision 0. Revision 1 reverted defining signed integer overflow. In revision 1, signed integer overflow is still undefined, precisely for this reason.
I like the idea of forbidding signed integer representations other than 2's complement, as it is de facto standard, pretty much nobody makes CPUs with non-standard integer representations, partly due to C programs assuming 2's complement integer representation.
What I don't like about this proposal is defining signed integer overflow as 2's complement wrapping. Yes, yes, I know undefined behaviour is evil, and programs wouldn't become much slower. However, if a program has signed overflow, it's likely a bug anyway. Defining signed overflow as 2's complement wrapping would mean not allowing other behaviours, in particular, trapping. On architectures with optional overflow traps (most architectures not called x86 or ARM), trapping would be much more preferable to quiet bugs. Meanwhile, while it is undefined behaviour, the implementation would be still free to define it, for instance in GCC it can be done with `-fwrapv`.
Seems like this was changed in more recent revisions, according to another comment:
The main change between [P0907r0] and the subsequent revision is to maintain undefined behavior when signed integer overflow occurs, instead of defining wrapping behavior.
The real issue isn't that C doesn't have a standard int overflow, but that it's undefined.
What they could have done is made it implementation defined, like sizeof(int), which depends on the implementation (hardware) but on the other hand isn't undefined behavior (so on x86/amd4 sizeof(int) will always be equal to 4).
size_t size = unreasonable large number;
char buf = malloc (size);
char *mid = buf + size / 2;
int index = 0;
for (size_t x = 0; x < big number; x++) mid[index++] = x;
A common optimization by a compiler is to introduce a temporary
char *temp = mid + index;
prior to the loop and then replace the body of the loop with
*(temp++) = x;
If the compiler has to worry about integer overflow, this optimization is not valid.
(I'm not a compiler engineer. Losing the optimization may be worth-while. Or maybe compilers have better ways of handling this nowadays. I'm just chiming in on why int overflow is intentionally undefined in the Fine Standard)
Just a nitpick. Implementation is about the particular compiler and runtime (stdlib) implementation, not the hardware. Hardware is the platform hosting the implementation (this are ISO C-standard defined terms).
A compiler targeting x86 platform can implement sizeof int == 8, or whatever it pleases, as far as C std is concerned.
The modern case for keeping signed overflow as UB is that it unlocks compiler optimizations. For example, it allows compilers to assume that `x+1>x`.
If implementations are forced to define signed overflow, then these optimizations are necessarily lost. So implementation-defined is effectively the same as fully-defined.
Getting rid of this useless (crap #!§$§$§$) legacy stuff was overdue, so i am very happy to see it done.
I personally think it is _the_ most important proposal for C++20, since it will remove a lot of pointless pressure from secure coding attempts and in turn make the world a little bit more secure.
I'm curious why some old architectures didn't use two's complement for signed numbers. What advantage did one's complement or signed magnitude have over two's complement?
Two's complement has the bizarre property of being asymmetric about zero. So things like `abs` can overflow, among several other oddities. It's not unambiguously better.
With one's complement it is easier to multiply by minus one: just invert all bits. It is also symmetrical around the zero, so sequences of random numbers will truly tend to average to zero.
It can be useful to distinguish between positive and negative zero in some cases, for example when dealing with values that have been rounded to zero or limits approaching zero.
The big one (for me) is that it's really annoying having one more negative value than positive value.
Most software doesn't handle this properly, they don't realise abs doesn't always return a positive number (as abs(INT_MIN)=INT_MIN), and many other similar problems.
In an ideal world, I would only use unsigned when you care about things like being able to use all bit representations, then have made the all-1s number something like NaN, for ints.
Negabinary operations are extremely simple and elegant. Like 2s complement and 1s complement, it suffers from asymmetry in its range, though even more so.
Conversion from signed to unsigned is always well-defined: the result is the unique value of the destination type that is congruent to the source integer modulo 2ⁿ.
...but that's not a change - it's been the case in C all along.
"Change Conversion from signed to unsigned is always well-defined: the result is the unique value of the destination type that is congruent to the source integer modulo 2N."
This is no change, since we have that already, e.g. see https://en.cppreference.com/w/cpp/language/implicit_conversi... and the conversion operation on the bit pattern is the identity for two's complement representation. The relevant section in the latest C++ standard is:
4.8 Integral conversions [conv.integral]
1 A prvalue of an integer type can be converted to a prvalue of another integer type. ...
2 If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source
integer (modulo 2 n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s
complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is
no truncation). — end note ]
Therefore the inverse conversion exists and is the identity as well, this is what should be sanctioned.
Quick question: in the proposed rewording of intro.execution¶8, why is the following rewriting “((a + b) + 32765)” not reintegrated at the end of the untouched text? Have I misunderstood that with two's complement this would be legal?
have they considered introducing new types for wrapping integers, checked integers and saturating integers. i understand why they might not want to make a change that could have a large effect on existing programs. but if you introduce new types then the new types will only effect new programs that choose to use them and this seems to be something that could be a library change than a language change.
Requiring two's complement just means you can't have a sensible C language on some sign-magnitude machine.
Even if nobody cares about such a machine, nothing is achieved other than perhaps simplifying a spec.
A language spec can provide a more detailed two's complement model with certain behaviors being defined that only make sense on two's complement machines, without tossing other machines out the window.
There could be a separate spec for a detailed two's complement model. That could be an independent document. (Analogy: IEEE floating-point.) Or it could be an optional section in ISO C.
Two's complement has some nice properties, but isn't nice in other regards. Multi-precision integer libraries tend to use sign-magnitude, for good reasons.
What I suspect is going on here is that someone is unhappy with what is going on in GCC development, and thinks ISO C is the suitable babysitting tool. (Perhaps a reasonable assumption, if those people won't listen to anything that doesn't come from ISO C.)
>> Even if nobody cares about such a machine, nothing is achieved other than perhaps simplifying a spec.
No, I use 16bit values to represent angles in embedded systems all the time. I routinely expect arithmetic on these values to roll over as 2's complement and I expect to take differences of angles using 2's complement all the time. I'm fully aware that this is undefined behavior and needs to be verified on each compiler/processor combination. It has always worked and yet it's undefined behavior. It would be nice for it to be defined. There are no modern machines that would be impacted by this.
The worst case is that there would not be an ISO C for such machines. As they are very unusual, this does not strike me as a big deal, and definitely less of an issue than making it easier to avoid invoking undefined behavior.
I take your point about the possible motives behind this proposal, which seem quite plausible.
[+] [-] raverbashing|7 years ago|reply
This is the most critical aspect. We have enough trouble already without the compiler actually fighting against security because this would fail in a machine from the 70s
[+] [-] virgilp|7 years ago|reply
IMO, this is just plain ignorance - people arguing against the standard, while believing only their favorite platform is significant. C code is still big in embedded, you can't just trash the standard like that.
[+] [-] sanxiyn|7 years ago|reply
[+] [-] fulafel|7 years ago|reply
[+] [-] Bromskloss|7 years ago|reply
[+] [-] lloda|7 years ago|reply
[+] [-] jwilk|7 years ago|reply
[+] [-] danra|7 years ago|reply
[+] [-] scott_s|7 years ago|reply
[+] [-] sanxiyn|7 years ago|reply
[+] [-] vortico|7 years ago|reply
[+] [-] ridiculous_fish|7 years ago|reply
This appears to be defining signed integer overflow semantics, which prevents the compiler from doing certain basic optimizations, for example, that (x*2)/2 == x. Is that part of this? Has anyone measured the perf cost on a real program?
[+] [-] userbinator|7 years ago|reply
...and it could be argued that if the compiler couldn't prove that was true, then it just helped you find a possible overflow bug in the code. If you actually wanted the compiler to unconditionally assume (x*2)/2==x and optimise accordingly, then you'd have to tell it such; e.g. MSVC has the __analysis_assume() feature.
[+] [-] sanxiyn|7 years ago|reply
[+] [-] deathanatos|7 years ago|reply
[+] [-] cornholio|7 years ago|reply
Fixed that for you.
[+] [-] GlitchMr|7 years ago|reply
What I don't like about this proposal is defining signed integer overflow as 2's complement wrapping. Yes, yes, I know undefined behaviour is evil, and programs wouldn't become much slower. However, if a program has signed overflow, it's likely a bug anyway. Defining signed overflow as 2's complement wrapping would mean not allowing other behaviours, in particular, trapping. On architectures with optional overflow traps (most architectures not called x86 or ARM), trapping would be much more preferable to quiet bugs. Meanwhile, while it is undefined behaviour, the implementation would be still free to define it, for instance in GCC it can be done with `-fwrapv`.
[+] [-] loup-vaillant|7 years ago|reply
There are programs that check for overflow after the fact. Is that a bug?
[+] [-] dEnigma|7 years ago|reply
The main change between [P0907r0] and the subsequent revision is to maintain undefined behavior when signed integer overflow occurs, instead of defining wrapping behavior.
[+] [-] greenhouse_gas|7 years ago|reply
What they could have done is made it implementation defined, like sizeof(int), which depends on the implementation (hardware) but on the other hand isn't undefined behavior (so on x86/amd4 sizeof(int) will always be equal to 4).
[+] [-] cjensen|7 years ago|reply
(I'm not a compiler engineer. Losing the optimization may be worth-while. Or maybe compilers have better ways of handling this nowadays. I'm just chiming in on why int overflow is intentionally undefined in the Fine Standard)
[+] [-] int0x80|7 years ago|reply
A compiler targeting x86 platform can implement sizeof int == 8, or whatever it pleases, as far as C std is concerned.
In practice compilers dont get creative about this. But there are real world cases where stuff is different, for example: http://www.unix.org/version2/whatsnew/lp64_wp.html
[+] [-] ridiculous_fish|7 years ago|reply
If implementations are forced to define signed overflow, then these optimizations are necessarily lost. So implementation-defined is effectively the same as fully-defined.
[+] [-] paulddraper|7 years ago|reply
Nothing is stopping your C compiler from making the guarantee sizeof(int)=4 on x86/amd64.
[+] [-] beyondCritics|7 years ago|reply
[+] [-] sanxiyn|7 years ago|reply
[+] [-] loup-vaillant|7 years ago|reply
[+] [-] fred256|7 years ago|reply
[+] [-] sunfish|7 years ago|reply
[+] [-] sasaf5|7 years ago|reply
[+] [-] 0xcde4c3db|7 years ago|reply
[+] [-] CJefferson|7 years ago|reply
Most software doesn't handle this properly, they don't realise abs doesn't always return a positive number (as abs(INT_MIN)=INT_MIN), and many other similar problems.
In an ideal world, I would only use unsigned when you care about things like being able to use all bit representations, then have made the all-1s number something like NaN, for ints.
[+] [-] garmaine|7 years ago|reply
[+] [-] paulddraper|7 years ago|reply
Negabinary operations are extremely simple and elegant. Like 2s complement and 1s complement, it suffers from asymmetry in its range, though even more so.
[+] [-] microcolonel|7 years ago|reply
[+] [-] caf|7 years ago|reply
Conversion from signed to unsigned is always well-defined: the result is the unique value of the destination type that is congruent to the source integer modulo 2ⁿ.
...but that's not a change - it's been the case in C all along.
[+] [-] beyondCritics|7 years ago|reply
"Change Conversion from signed to unsigned is always well-defined: the result is the unique value of the destination type that is congruent to the source integer modulo 2N."
This is no change, since we have that already, e.g. see https://en.cppreference.com/w/cpp/language/implicit_conversi... and the conversion operation on the bit pattern is the identity for two's complement representation. The relevant section in the latest C++ standard is: 4.8 Integral conversions [conv.integral] 1 A prvalue of an integer type can be converted to a prvalue of another integer type. ... 2 If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2 n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). — end note ]
Therefore the inverse conversion exists and is the identity as well, this is what should be sanctioned.
[+] [-] FrozenVoid|7 years ago|reply
[+] [-] fanf2|7 years ago|reply
[+] [-] jgtrosh|7 years ago|reply
[+] [-] polthrowaway|7 years ago|reply
[+] [-] unknown|7 years ago|reply
[deleted]
[+] [-] kazinator|7 years ago|reply
Even if nobody cares about such a machine, nothing is achieved other than perhaps simplifying a spec.
A language spec can provide a more detailed two's complement model with certain behaviors being defined that only make sense on two's complement machines, without tossing other machines out the window.
There could be a separate spec for a detailed two's complement model. That could be an independent document. (Analogy: IEEE floating-point.) Or it could be an optional section in ISO C.
Two's complement has some nice properties, but isn't nice in other regards. Multi-precision integer libraries tend to use sign-magnitude, for good reasons.
What I suspect is going on here is that someone is unhappy with what is going on in GCC development, and thinks ISO C is the suitable babysitting tool. (Perhaps a reasonable assumption, if those people won't listen to anything that doesn't come from ISO C.)
[+] [-] phkahler|7 years ago|reply
No, I use 16bit values to represent angles in embedded systems all the time. I routinely expect arithmetic on these values to roll over as 2's complement and I expect to take differences of angles using 2's complement all the time. I'm fully aware that this is undefined behavior and needs to be verified on each compiler/processor combination. It has always worked and yet it's undefined behavior. It would be nice for it to be defined. There are no modern machines that would be impacted by this.
[+] [-] mannykannot|7 years ago|reply
I take your point about the possible motives behind this proposal, which seem quite plausible.