In my opinion bugs related to endianness expose a design flaw in the C/C++ programming languages, or at least in the way people tend to use those languages. It isn't that hard to write code that works well with any kind of endianness, and you don't need stuff like __BYTE_ORDER__ for that. For example, if I wanted to read a little endian 32-bit integer value from a byte stream, I could write a function like this:
Yes, it's totally possible to write good abstractions for this sort of basic operation, which handle both endianness and alignment. (QEMU's version of them uses memcpy(), again relying on the compiler to optimise this to a simple load where the host CPU supports unaligned accesses.) I agree that doing that kind of thing so code can be endian-agnostic and keeping the ifdefs to a minimum is better style and easier to understand.
But if the big-endian case isn't run through CI then bugs will still creep in, where people forget to use the abstraction and just cast a buffer pointer to do a 32-bit read, or where they don't keep the distinction between 'a 32 bit value in host endianness' and 'a 32-bit value in host-independent endianness (eg network data, or pixel data)' straight and forget a conversion step. QEMU's testing on s390 and ppc64 catches a steady trickle of this kind of bug before it goes into the codebase.
(You can catch unaligned-access bugs on x86 with clang/gcc address-sanitizer, I think, but it can't help with endianness bugs.)
It certainly doesn't help that the "wrong" way is featured in almost every textbook, tutorial and article I've seen in the last 15 years.
Your example is actually for the most favourable case because there are native 32-bit types. I've seen countless attempts to implement three-byte records via custom Uint24 classes that overload operators and abstract endianness (in various buggy ways) instead of just reading three damn bytes in a uint32_t. Supposedly in the name of readability ("I want to be explicit that this stores a 24-bit variable, not a 32-bit variable, so I'm going to copy-paste these 200 lines of either template magic or line noise, I'm not sure, but see how readable it is now? Oh yeah it's still 32-bit underneath, yeah, I'm not sure what happens if you write a 32-bit variable there, I think it breaks. This is way better than a typedef though, 'cause, uh, C++-17?)
For a long time, I struggled with endianness, because I thought that bitshifting and masking would be different on BE vs. LE:
// wrong
uint32_t num = ...;
// num is set to 0x10203040
num >> 24 == 0x10; // On little endian
num >> 24 == 0x10; // On big endian
Of course that is wrong, and the second one is always the case. In a way, you could say C/C++ is big endian - the higher valued byte is first (left in English reading order). In fact, if you avoid reinterpreting via casting / punning / memcpy, you don't have to think about it at all [1].
One thing I have to do frustratingly often is to implement a bunch of functions like "read_uint32_le", it is a pity that there is no comprehensive standard.
([1] On the other hand, I miss the good old days where you could treat C like "high level assembler", a pointer was just a number that indexed memory, and you could reinterpret bytes with impunity - sometimes had to to get decent performance!)
I clicked around a little, and it seems "icc 21.1.9", "x64 msvc x19.28", "risc-v clang (trunk)", "armv7-a clang (trunk)" compile it to four separate byte loads, it looks like. It's not that I don't trust my compiler (usually clang) to optimize it this way, it's just that the level of variability between compilers makes me think I shouldn't really rely on this behaving this way.
I wonder what the arch difference is between the clangs that do it and the ones that don't, I don't know much about arm and risc-v.
These functions will either swap, or not, depending on the endianness of the target machine, and this decision is made at compile time, so the "no swap" case imposes no performance penalty.)
That way, the intent is clear, and the code works on big endian and little endian. So instead of what you propose, which I imagine you might use via:
x = some_value; /* say this is big endian, and we want it little endian, because we're on Intel. */
uint32_t y = read_int((char *) &x);
you'd have:
y = be32_to_cpu(x);
Now the intent is clear, and the code is portable, x is big endian, and we want native CPU (whether that be big endian or little endian).
Likewise, you could write:
x = cpu_to_be32(y);
say before writing it out to disk or network for some purpose that requires big endian.
This is similar to htonl(), ntoh(), htons(), ntohs(). (host to network/network to host long/short).
The long term good news is that the endian wars are drawing to a close. Little Endian won. I believe all the most recent CPU's support little endian at least as an option.
We will still have to deal with the legacy network protocols that have big endian encoding.
By the way, if you are designing a new network binary protocol, please do not pick big endian, it just makes stuff more complicated on the majority of current cpus.
Strictly speaking no one uses big endian, all computers I know of are either little endian or mixed endian ('mixed. because they do strings low to high but numbers high to low).
The only reason that big/mixed endian is even an issue is because some monks fucked up back when the Moors were driven from Spain ... in the wonderful libraries that were left behind they discovered Arabic numbers and realised how useful they were, far better than the clunky Roman ones - for a start a small businessman could do addition just right there on the page, writing the digits from smallest to largest as they went - sadly as I said the monks screwed up and copied the digit order from the right to left written Arabic into western languages without doing the sane thing and reversing the digits so that they continued to be written smallest to largest (so that you could do addition without knowing how much space you needed for digits until you were done).
And so we're stuck with this silly artifact of history that has caused decades of arguing as we slowly rediscover the utility of putting the smaller digits first, as they were always meant to be
The long term good news is that the endian wars are drawing to a close. Little Endian won. I believe all the most recent CPU's support little endian at least as an option.
From what I know from talking to IBM engineers, there are no plans to switch AIX on POWER or Linux on s390x to little-endian.
And both are important platforms in the enterprise world.
I think the majority of binary integer data sent and received today is big endian. Calling it "legacy" seems like a stretch. People are still designing plenty of brand new protocols using big endian. When I encounter a little endian one I am pleasantly surprised, but it is still a surprise.
Big endian won the encoding wars with many important formats using big endian. Nearly all crypto is big endian for instance because of big number byte strings.
It's very very rare that anyone needs to care about endianness outside of the application edge space (ie. reading or writing to data sources and sinks).
In the article, many of the errors were a result of undefined behaviour that just happened to work on the CI machines upstream Qt ran. Relying on undefined behaviour in your code will bite you in the bum sooner or later, regardless of endianness.
Appears nobody has done the same for Linux yet. The link above mentions that Rpi4 isn't done because BE mode with ACPI either isn't possible, or is difficult.
[+] [-] EdSchouten|5 years ago|reply
[+] [-] pm215|5 years ago|reply
But if the big-endian case isn't run through CI then bugs will still creep in, where people forget to use the abstraction and just cast a buffer pointer to do a 32-bit read, or where they don't keep the distinction between 'a 32 bit value in host endianness' and 'a 32-bit value in host-independent endianness (eg network data, or pixel data)' straight and forget a conversion step. QEMU's testing on s390 and ppc64 catches a steady trickle of this kind of bug before it goes into the codebase.
(You can catch unaligned-access bugs on x86 with clang/gcc address-sanitizer, I think, but it can't help with endianness bugs.)
[+] [-] alxlaz|5 years ago|reply
Your example is actually for the most favourable case because there are native 32-bit types. I've seen countless attempts to implement three-byte records via custom Uint24 classes that overload operators and abstract endianness (in various buggy ways) instead of just reading three damn bytes in a uint32_t. Supposedly in the name of readability ("I want to be explicit that this stores a 24-bit variable, not a 32-bit variable, so I'm going to copy-paste these 200 lines of either template magic or line noise, I'm not sure, but see how readable it is now? Oh yeah it's still 32-bit underneath, yeah, I'm not sure what happens if you write a 32-bit variable there, I think it breaks. This is way better than a typedef though, 'cause, uh, C++-17?)
[+] [-] captainmuon|5 years ago|reply
One thing I have to do frustratingly often is to implement a bunch of functions like "read_uint32_le", it is a pity that there is no comprehensive standard.
([1] On the other hand, I miss the good old days where you could treat C like "high level assembler", a pointer was just a number that indexed memory, and you could reinterpret bytes with impunity - sometimes had to to get decent performance!)
[+] [-] conistonwater|5 years ago|reply
I wonder what the arch difference is between the clangs that do it and the ones that don't, I don't know much about arm and risc-v.
[+] [-] jcelerier|5 years ago|reply
But for the longest time people would basically dump the raw bytes of their structs from / to the network whenever possible, e.g.
[+] [-] alfanick|5 years ago|reply
[+] [-] cbmuser|5 years ago|reply
[+] [-] smcameron|5 years ago|reply
These functions will either swap, or not, depending on the endianness of the target machine, and this decision is made at compile time, so the "no swap" case imposes no performance penalty.)
That way, the intent is clear, and the code works on big endian and little endian. So instead of what you propose, which I imagine you might use via:
you'd have: Now the intent is clear, and the code is portable, x is big endian, and we want native CPU (whether that be big endian or little endian).Likewise, you could write:
say before writing it out to disk or network for some purpose that requires big endian.This is similar to htonl(), ntoh(), htons(), ntohs(). (host to network/network to host long/short).
[+] [-] RcouF1uZ4gsC|5 years ago|reply
We will still have to deal with the legacy network protocols that have big endian encoding.
By the way, if you are designing a new network binary protocol, please do not pick big endian, it just makes stuff more complicated on the majority of current cpus.
[+] [-] Taniwha|5 years ago|reply
The only reason that big/mixed endian is even an issue is because some monks fucked up back when the Moors were driven from Spain ... in the wonderful libraries that were left behind they discovered Arabic numbers and realised how useful they were, far better than the clunky Roman ones - for a start a small businessman could do addition just right there on the page, writing the digits from smallest to largest as they went - sadly as I said the monks screwed up and copied the digit order from the right to left written Arabic into western languages without doing the sane thing and reversing the digits so that they continued to be written smallest to largest (so that you could do addition without knowing how much space you needed for digits until you were done).
And so we're stuck with this silly artifact of history that has caused decades of arguing as we slowly rediscover the utility of putting the smaller digits first, as they were always meant to be
[+] [-] cbmuser|5 years ago|reply
The long term good news is that the endian wars are drawing to a close. Little Endian won. I believe all the most recent CPU's support little endian at least as an option.
From what I know from talking to IBM engineers, there are no plans to switch AIX on POWER or Linux on s390x to little-endian.
And both are important platforms in the enterprise world.
[+] [-] jzwinck|5 years ago|reply
[+] [-] plq|5 years ago|reply
[+] [-] BelenusMordred|5 years ago|reply
If you could tell cryptographers this I would be very grateful.
[+] [-] goalieca|5 years ago|reply
[+] [-] smcameron|5 years ago|reply
[+] [-] bregma|5 years ago|reply
In the article, many of the errors were a result of undefined behaviour that just happened to work on the CI machines upstream Qt ran. Relying on undefined behaviour in your code will bite you in the bum sooner or later, regardless of endianness.
[+] [-] zajio1am|5 years ago|reply
Does anyone know about some reasonably priced big-endian linux hardware that could be added to build system to run CI tests on big-endian?
The only thing i found is Ubiquiti Edgerouter Pro (MIPS64 BE).
[+] [-] tyingq|5 years ago|reply
Appears nobody has done the same for Linux yet. The link above mentions that Rpi4 isn't done because BE mode with ACPI either isn't possible, or is difficult.
[+] [-] rkagerer|5 years ago|reply