Signed Integers Are Two’s Complement

[+] raverbashing|7 years ago|reply

> Naïve overflow checks, which are often security-critical, often get eliminated by compilers. This leads to exploitable code when the intent was clearly not to and the code, while naïve, was correctly performing security checks for two’s complement integers.

This is the most critical aspect. We have enough trouble already without the compiler actually fighting against security because this would fail in a machine from the 70s

[+] virgilp|7 years ago|reply

Why are you assuming "machine from the 70s"? I know modern processors (DSPs) that need to saturate on integer arithmetic in order to maintain correctness. If you want your naive overflow checks to work on your x86 project, why not just use a compiler option like -fwrapv?

IMO, this is just plain ignorance - people arguing against the standard, while believing only their favorite platform is significant. C code is still big in embedded, you can't just trash the standard like that.

[+] sanxiyn|7 years ago|reply

This got rejected in the next revision of this proposal. Naive overflow checks are still undefined.

[+] fulafel|7 years ago|reply

Isn't the quoted part 180 degrees wrong? Such code was not "correctly performing security checks", since it was undefined behaviour - two's complement or not. Which was the whole problem.

[+] Bromskloss|7 years ago|reply

By all means, make the change for the new version of the language. In the meantime, though, why are people assuming two's complement?!

[+] lloda|7 years ago|reply

The link is an outdated version r0, this is r2 (don't know if it's the latest) http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p090...

[+] jwilk|7 years ago|reply

The main change between [P0907r0] and the subsequent revision is to maintain undefined behavior when signed integer overflow occurs, instead of defining wrapping behavior.

[+] danra|7 years ago|reply

You can always get the most up-to-date public version using wg21.link/pxxxx. So for this proposal: https://wg21.link/p0907

[+] scott_s|7 years ago|reply

In April, the author of this proposal said on Twitter that the C++ standards committee agreed to this proposal for C++20: https://twitter.com/jfbastien/status/989242576598327296?lang...

[+] sanxiyn|7 years ago|reply

This proposal, but not this revision. (Tweets are careful about this.) In particular, signed integer overflow is still undefined.

[+] vortico|7 years ago|reply

Glad to hear! I enjoy standards documents which result in a net decrease of a standard due to increasing simplicity.

[+] ridiculous_fish|7 years ago|reply

> Overflow in the positive direction shall wrap around

This appears to be defining signed integer overflow semantics, which prevents the compiler from doing certain basic optimizations, for example, that (x*2)/2 == x. Is that part of this? Has anyone measured the perf cost on a real program?

[+] userbinator|7 years ago|reply

It prevents risky optimisations; this now requires the compiler to prove that such optimisations won't change the semantics of the code, e.g. in your case by essentially proving that the high 2 bits of x (only 1 in the unsigned case, due to sign-extension) will never be set.

...and it could be argued that if the compiler couldn't prove that was true, then it just helped you find a possible overflow bug in the code. If you actually wanted the compiler to unconditionally assume (x*2)/2==x and optimise accordingly, then you'd have to tell it such; e.g. MSVC has the __analysis_assume() feature.

[+] sanxiyn|7 years ago|reply

The link is revision 0. Revision 1 reverted defining signed integer overflow. In revision 1, signed integer overflow is still undefined, precisely for this reason.

[+] deathanatos|7 years ago|reply

For some examples of the types of optimizations enabled by allowing overflow to be undefined: https://kristerw.blogspot.com/2016/02/how-undefined-signed-o...

[+] cornholio|7 years ago|reply

> for example, that (x&~1*2)/2 == x

Fixed that for you.

[+] GlitchMr|7 years ago|reply

I like the idea of forbidding signed integer representations other than 2's complement, as it is de facto standard, pretty much nobody makes CPUs with non-standard integer representations, partly due to C programs assuming 2's complement integer representation.

What I don't like about this proposal is defining signed integer overflow as 2's complement wrapping. Yes, yes, I know undefined behaviour is evil, and programs wouldn't become much slower. However, if a program has signed overflow, it's likely a bug anyway. Defining signed overflow as 2's complement wrapping would mean not allowing other behaviours, in particular, trapping. On architectures with optional overflow traps (most architectures not called x86 or ARM), trapping would be much more preferable to quiet bugs. Meanwhile, while it is undefined behaviour, the implementation would be still free to define it, for instance in GCC it can be done with `-fwrapv`.

[+] loup-vaillant|7 years ago|reply

> However, if a program has signed overflow, it's likely a bug anyway.

There are programs that check for overflow after the fact. Is that a bug?

[+] dEnigma|7 years ago|reply

Seems like this was changed in more recent revisions, according to another comment:

The main change between [P0907r0] and the subsequent revision is to maintain undefined behavior when signed integer overflow occurs, instead of defining wrapping behavior.

[+] greenhouse_gas|7 years ago|reply

The real issue isn't that C doesn't have a standard int overflow, but that it's undefined.

What they could have done is made it implementation defined, like sizeof(int), which depends on the implementation (hardware) but on the other hand isn't undefined behavior (so on x86/amd4 sizeof(int) will always be equal to 4).

[+] cjensen|7 years ago|reply

It's undefined for a reason.

  size_t size = unreasonable large number;
  char buf = malloc (size);
  char *mid = buf + size / 2;
  int index = 0;
  for (size_t x = 0; x < big number; x++) mid[index++] = x;

A common optimization by a compiler is to introduce a temporary

  char *temp = mid + index;

prior to the loop and then replace the body of the loop with

  *(temp++) = x;

If the compiler has to worry about integer overflow, this optimization is not valid.

(I'm not a compiler engineer. Losing the optimization may be worth-while. Or maybe compilers have better ways of handling this nowadays. I'm just chiming in on why int overflow is intentionally undefined in the Fine Standard)

[+] int0x80|7 years ago|reply

Just a nitpick. Implementation is about the particular compiler and runtime (stdlib) implementation, not the hardware. Hardware is the platform hosting the implementation (this are ISO C-standard defined terms).

A compiler targeting x86 platform can implement sizeof int == 8, or whatever it pleases, as far as C std is concerned.

In practice compilers dont get creative about this. But there are real world cases where stuff is different, for example: http://www.unix.org/version2/whatsnew/lp64_wp.html

[+] ridiculous_fish|7 years ago|reply

The modern case for keeping signed overflow as UB is that it unlocks compiler optimizations. For example, it allows compilers to assume that `x+1>x`.

If implementations are forced to define signed overflow, then these optimizations are necessarily lost. So implementation-defined is effectively the same as fully-defined.

[+] paulddraper|7 years ago|reply

> on x86/amd4 sizeof(int) will always be equal to 4

Nothing is stopping your C compiler from making the guarantee sizeof(int)=4 on x86/amd64.

[+] beyondCritics|7 years ago|reply

Getting rid of this useless (crap #!§$§$§$) legacy stuff was overdue, so i am very happy to see it done. I personally think it is _the_ most important proposal for C++20, since it will remove a lot of pointless pressure from secure coding attempts and in turn make the world a little bit more secure.

[+] sanxiyn|7 years ago|reply

I don't see how. Integer overflows still can be security issues even if they wrap.

[+] loup-vaillant|7 years ago|reply

Alas, nope. Later revisions of this proposal still have undefined signed overflow. We still need -fwrap for the easy overflow checks.

[+] fred256|7 years ago|reply

I'm curious why some old architectures didn't use two's complement for signed numbers. What advantage did one's complement or signed magnitude have over two's complement?

[+] sunfish|7 years ago|reply

Two's complement has the bizarre property of being asymmetric about zero. So things like `abs` can overflow, among several other oddities. It's not unambiguously better.

[+] sasaf5|7 years ago|reply

With one's complement it is easier to multiply by minus one: just invert all bits. It is also symmetrical around the zero, so sequences of random numbers will truly tend to average to zero.

[+] 0xcde4c3db|7 years ago|reply

It can be useful to distinguish between positive and negative zero in some cases, for example when dealing with values that have been rounded to zero or limits approaching zero.

[+] CJefferson|7 years ago|reply

The big one (for me) is that it's really annoying having one more negative value than positive value.

Most software doesn't handle this properly, they don't realise abs doesn't always return a positive number (as abs(INT_MIN)=INT_MIN), and many other similar problems.

In an ideal world, I would only use unsigned when you care about things like being able to use all bit representations, then have made the all-1s number something like NaN, for ints.

[+] garmaine|7 years ago|reply

In addition to what's mentioned in the already great sibling comments, it's worth noting that IEEE floating point is signed-magnitude.

[+] paulddraper|7 years ago|reply

Don't forget negabinary! https://en.m.wikipedia.org/wiki/Negative_base

Negabinary operations are extremely simple and elegant. Like 2s complement and 1s complement, it suffers from asymmetry in its range, though even more so.

[+] microcolonel|7 years ago|reply

The equivalent for C/WG14: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2218.htm

[+] caf|7 years ago|reply

That document has this listed as a Change:

Conversion from signed to unsigned is always well-defined: the result is the unique value of the destination type that is congruent to the source integer modulo 2ⁿ.

...but that's not a change - it's been the case in C all along.

[+] beyondCritics|7 years ago|reply

There seems to be an error in this proposal:

"Change Conversion from signed to unsigned is always well-defined: the result is the unique value of the destination type that is congruent to the source integer modulo 2N."

This is no change, since we have that already, e.g. see https://en.cppreference.com/w/cpp/language/implicit_conversi... and the conversion operation on the bit pattern is the identity for two's complement representation. The relevant section in the latest C++ standard is: 4.8 Integral conversions [conv.integral] 1 A prvalue of an integer type can be converted to a prvalue of another integer type. ... 2 If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2 n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). — end note ]

Therefore the inverse conversion exists and is the identity as well, this is what should be sanctioned.

[+] FrozenVoid|7 years ago|reply

FYI you can prevent all of non-2 complement problems with -fwrapv which forces 2' complement wrapping math in gcc/clang/icc

[+] fanf2|7 years ago|reply

Also relevant is HAKMEM item 154 in which Bill Gosper concluded that the universe is two’s complement: http://catb.org/jargon/html/H/HAKMEM.html

[+] jgtrosh|7 years ago|reply

Quick question: in the proposed rewording of intro.execution¶8, why is the following rewriting “((a + b) + 32765)” not reintegrated at the end of the untouched text? Have I misunderstood that with two's complement this would be legal?

[+] polthrowaway|7 years ago|reply

have they considered introducing new types for wrapping integers, checked integers and saturating integers. i understand why they might not want to make a change that could have a large effect on existing programs. but if you introduce new types then the new types will only effect new programs that choose to use them and this seems to be something that could be a library change than a language change.

[+] unknown|7 years ago|reply

[deleted]

[+] kazinator|7 years ago|reply

Requiring two's complement just means you can't have a sensible C language on some sign-magnitude machine.

Even if nobody cares about such a machine, nothing is achieved other than perhaps simplifying a spec.

A language spec can provide a more detailed two's complement model with certain behaviors being defined that only make sense on two's complement machines, without tossing other machines out the window.

There could be a separate spec for a detailed two's complement model. That could be an independent document. (Analogy: IEEE floating-point.) Or it could be an optional section in ISO C.

Two's complement has some nice properties, but isn't nice in other regards. Multi-precision integer libraries tend to use sign-magnitude, for good reasons.

What I suspect is going on here is that someone is unhappy with what is going on in GCC development, and thinks ISO C is the suitable babysitting tool. (Perhaps a reasonable assumption, if those people won't listen to anything that doesn't come from ISO C.)

[+] phkahler|7 years ago|reply

>> Even if nobody cares about such a machine, nothing is achieved other than perhaps simplifying a spec.

No, I use 16bit values to represent angles in embedded systems all the time. I routinely expect arithmetic on these values to roll over as 2's complement and I expect to take differences of angles using 2's complement all the time. I'm fully aware that this is undefined behavior and needs to be verified on each compiler/processor combination. It has always worked and yet it's undefined behavior. It would be nice for it to be defined. There are no modern machines that would be impacted by this.

[+] mannykannot|7 years ago|reply

The worst case is that there would not be an ISO C for such machines. As they are very unusual, this does not strike me as a big deal, and definitely less of an issue than making it easier to avoid invoking undefined behavior.

I take your point about the possible motives behind this proposal, which seem quite plausible.

123 comments