My first impression is that all three of these are clutter to an already very cluttered language ...
1.) Coming from embedded programming, I can see the utility of `std::unreachable`. But shouldn't this be a compiler directive? Or a standardized #pragma? Can someone more knowledgeable in C++ say whether using functions as markers is a common mechanism in std:: ?
2.)Maybe the example is bad here, as it doesn't even save typing. (18 chars for `std::to_underlying` vs. 16 for `static_cast<int>`. The 'old' variant seems more expressive to boot. If anything, how about `static_cast<auto>`? or <underlying> ...
3.) The usefulness of `std::byteswap` to convert data to network byte order seems trivial vs. the venerable old `htonl` family of functions. `std::byteswap` seems more like intrinsics meant to expose possibly present target machine instructions to the user. Like `std::unreachable` this is probably of most use to embedded / low-level programming. This may be a deficiency in the article...
If it's not obvious, I'm not a big C++ fan and read the article with C-tinted glasses :)
1. It is a compiler defined function (`__builtin_unreachable()`), but the issue is that MSVC doesn't have it, so you need a different implementation per compiler [0]. Plus, if a new compiler shows up (besides MSVC/GCC/LLVM), you'd need to investigate what the correct way to express `__builtin_unreachable` is.
From a compiler perspective, using a function makes the most sense, since that fits into the existing control-flow analysis that the compiler will do. Pragmas are processed by the pre-processor, so they aren't appropriate for expressing control flow hints.
2. `std::to_underlying(t)` is a wrapper around `static_cast<std::underlying_type_t<std::remove_cv_t<std::remove_reference_t<decltype(t)>>>>(t)`. So a lot fewer characters. It's also useful since `std::underlying_type_t<T>` behaves weirdly if `T` is not an enum type.
I think you are maybe missing the context that C++ allows the representation of an enum to be defined, e.g. `enum class X : unsigned char {};` vs `enum class Y : unsigned long long {};`. So you can't always cast to `int`. Technically, this isn't the case in C either: the type defaults to `int`, but the compiler will pick a larger type if necessary, e.g. `enum Z { a = ((long long)INT_MAX) + 1 };`
3. `htonl` are not standardized, so they were not part of the C++ standard library. Also, on Windows, I believe you'd need to include `winsock.h` to get access to them, which has its own idiosyncratic issues. You are also missing the context of C++ defining operator overloading, so you can call `std::byteswap(0ull)` and get an `unsigned long long` and you can call `std::byteswap(std::uint16_t{0})` and get a 16 bit unsigned integer.
1. Why "should" it be something different? It is semantically part of code flow; making it a pragma breaks that model. This replaces the nonstandard __builtin_unreachable().
2. This doesn't exist to save typing. static_cast<auto> doesn't make sense to me (that reads like a no-op). static_cast<underlying> introduces a new reserved word which is a big no-no.
3. This is not the same as htonl, which only does anything on little-endian machines. (htonl is also a POSIX function, not a C++ function.)
> ...`std::unreachable`. But shouldn't this be a compiler directive?
Probably "committee pragmatism", a stdlib change might be easier to get approved than a language change, and compilers already have builtins for this, they're just not compatible (but the differences can be wrapped in a macro, and since C++ doesn't like to expose macros, it's probably still a macro, but hidden inside a stdlib template).
FWIW it looks like C23 will also just get a macro:
1.) Functions are fine for this stuff. The compiler can be trusted for its ability to "inline" the language-level `__builtin_unreachable()` or equivalent at the relevant optimization levels.
2.) static_cast<int> is shorter, if you know that the underlying type is int. But even if you know, you might not want to spell out int, to be more robust to code changes, which might involve a change of the underlying type of the corresponding enum.
In generic context you might not even know the underlying type.
3.) I absolutely agree. I think it was a mistake to include. The included `std::byteswap` can be expressed in terms of `std::ranges::reverse(std::as_writable_bytes(obj))`. It could be quality of implementation detail to get a bswap instruction from the latter.
I would be happy with equivalents of the `htonl` functions in the standard library, but I have strong opinions of the appropriate function signatures of it for C++.
Lisp perspective: anything that can be a function almost certainly should be a function (and not a special operator or macro).
std::unreachable() does not have any arguments; therefore it doesn't need any special argument evaluation semantics that would require a compiler built-in.
The way C and C++ work, compiler directives are keywords and not identifiers. Keywords are not namespaced. Introducing a keyword called "unreachable" is problematic; far more so than a new element in the std namespace.
My only problem with std::unreachable is that I would never use it over std::abort.
Nobody needs a function whose only job is to invoke undefined behavior (from which it is then assumed that it is not reached).
It's a good cold day, so I can almost hear the Rust people laughing in the distance.
Is there syntax available in C++ to write some kind of instruction to the compiler which is not some kind of call? Even __builtin_trap is a call isn't it? What else could you attach a directive to?
I definitely agree with 1. I find the inclusion of core language features in std:: to be rather disconcerting. I've always thought of std:: as a set of standard useful library functions, separate from actual language features.
Hmm. How often do people actaully want to std::byteswap as opposed to "convert this value from native byte order to big-endian" or "convert this value from little-endian to native byte order"?
Yeah, for portable code you also need a function to tell you if you're on an architecture where you need to do a byteswap for the data you have. e.g. you know you have data in little-endian format - do you need to swap it to work with it natively? That depends.
Maybe having something like convert_be() and convert_le(), one of which is a no-op and the other does the byteswap (depending on your arch) would be better. It removes the duplication of e.g. htobe() and betoh() which are exactly the same function, while allowing the caller to not worry about which architecture their code is compiled on.
I've always thought that it should have been possible in C and C++ to declare endian-ness as the property of a member variable's type, and that's it: the compiler would then automatically choose where to swap the byte-order to/from he native byte order.
The benefits are obvious: Code is declarative, and there's no risk of a bug where you missed to call std::byteswap() or did it twice.
Also, the compiler could automatically extract and insert values from/to little-endian and big-endian bitfields (which can be a handful...), and it could optimise to reduce the number of byteswaps in the code.
I love how over the past decade my own C++ utility library has been continuously shrinking because with each update there are more and more utility functions (like the to_underlying this article mentions) and even complete libraries (like <filesystem>) which replace self-written or 3rd party code.
I always worry as much as anyone else on each new release for the additional complexity ("the committee is out of control!!!!1!!eleven!"), but on each compiler upgrade when I actually get to use the new versions of the standard I'm always pleasantly surprised about all the little low-key quality of life improvements.
I understand why it's there, but I do find it fun that when many people are trying to reduce undefined behaviour in their code, std::unreachable is literally defined as "this is undefined behaviour, use that to optimise".
I suspect 99.9% of uses of std::unreachable would be better replaced by abort. (There will be those times when the code is correct and the optimisation gains are worth it -- but they will be rare).
That's not a good advice. Only if the sender and receiver are guaranteed to be running on little endian architecture you can make such a claim. A better advice is to always consider the endian-ness when designing protocols and have a strategy to handle it.
>Byte swapping is important when transferring data between system that use different order for the sequence of bytes stores in memory.
That seems like a glaring footgun to me, to the point where I think I must be missing something.
What I want when dealing with endianess are "from_little_endian/to_little_endian", "from_big_endian/to_big_endian" function pairs that expand to either nop or a byte swap depending on the host architecture.
Exposing the byte swapping directly without this layer on top is asking for trouble because every user will have to make sure that they correctly detect the local endianess before attempting a swap. That's the potentially tricky part, not swapping the bytes.
Kind of off topic, but is there a good "catch up" guide for people who stopped paying attention after C++11? Like a quick summary of just the useful, practical things you'd actually want to use in production, rather than the parts that are only interesting to computer science academics? Most of the "what's new in C++X" guides seem to just dump everything on you at once. I feel like I should get back into C++ but I don't think I really need to care about folding expressions or std::bit_cast.
I found Stroustrop's book A Tour of C++ very useful for catching up with C++14/17. It's relatively short and it's easy to skim through parts you already know well.
It was just updated for C++20 (with some coverage of C++23).
There's always a bunch of CppCon talks for this too. Just pop on over to their YouTube channel.
I don't know, but 'swap' seems to have become the standard term for the operation, used in places like the C bswap family of functions and the x86 'bswap' instruction. My (totally unsupported and unresearched) guess is that it became popular as a term when 16-bit architectures were common -- "swap the bytes in a 16 bit value" is unambiguous.
The Arm architecture does call this operation "reverse bytes", though, so it's not universal to call it "swap".
No thanks, I'm going to keep calling abort (ANSI C, 1989) to indicate "control stops here":
Code after an abort() call is unreachable. (Or after any function attributed noreturn).
GCC and Clang know this, and do things accordingly.
For instance, I've seen GCC emit code which assumes that ptr is not null after ASSERT(ptr != NULL), because the custom ASSERT macro called an __attribute__((noreturn)) function in the null case.
abort() has defined behavior; it terminates the program abnormally, as if by raising the SIGABRT signal.
Your own function attributed __noreturn__ can have whatever behavior you want it to have. The one in the ASSERT macro I alluded to above calculates and prints a backtrace and other useful information.
Silly question from someone who hasn’t written C++ in 20 years and only very vaguely remembers it - if a code path is unreachable, why have the path at all?
Is a default on a switch required? Is lack thereof a compiler warning or something? It’s been a very long time; I only ever seem to recall that kind of “all paths must be handled” from functional languages.
Having an unreachable block communicates to other humans that you didn't forget the else case, it shouldn't exist. Code is about communicating to the next maintainer of the code.
Unreachable communicates to static analyzers, which can throw an error if it detects a code path that would reach this code, even though otherwise that code path is fine. Also static analysis will stop analysis at this point, and since static analysis often is running into the halting problem having a forced halt means some other heuristic elsewhere will get more time to run and so it can find bugs in a different code path that it wouldn't have analyzed before.
> Is a default on a switch required?
Many style guide do not allow a default case on a switch. If you don't have a default case and you add a new item static analysis will flag an error (compiler warning), thus ensuring you look at that section of code that you may not have known about. So unreachable is a way to mark a lot of not possible cases as ones you have thought about, without either skipping them or adding a default.
> I only ever seem to recall that kind of “all paths must be handled” from functional languages.
C++ doesn't require all paths be handled, but realistically as a programmer you want to handle all paths. Marking a path as not reachable is a useful way to handle impossible code paths.
It comes up from time to time, for example sometimes you'll have a big enum where most of the values can be handled in a straightforward way, and things are weird and complex for the other values. So you write something like
int process(my_enum x, state_t state) {
if (x == my_enum::COMPLICATED) return do_stuff(state);
switch (x) {
case my_enum::SIMPLE0: return 0;
case my_enum::SIMPLE1: return 1;
case my_enum::SIMPLE2: return 2;
}
}
and your compiler complains that my_enum::COMPLICATED isn't handled, even though it clearly was. You generally like these warnings, because they do keep you safe, but in this case, you're smarter than the compiler, but it forces you to put something there. With std::unreachable, it will squelch the warning and not emit extra code. For example, you might be tempted to throw an error -- but if nothing else in the function throws, you end up emitting a bunch of unreachable error-handling code that the compiler can't remove.
It can also improve codegen and is a building block of building an assume like macro to convey preconditions to the compiler and optimize potentially on that. The other is at the end of a function that can be guaranteed to never reach that point so that the compiler can know. Its a low level tool and shouldn’t be used without caution
It can help both optimizations and static analysis. For example in the switch case in principle the compiler can avoid doing bound checks on the switch jump table. For static analysis it can help flags paths that can actually happen as erroneous.
Not C++ specific, but the example given - a default case that is not expected to ever be hit - is a "cover all bases" thing, to avoid undefined behaviour. Without the default case, what should happen? In the example given, it would just do nothing, but it would do so silently, which could lead to hours of developer time wasted trying to figure out why it doesn't do anything.
But I can imagine other use cases that are like "this is not supposed to happen" that can cause major issues like buffer overflows. Better to be more defensive and write code that basically says you are aware of code that shouldn't be reachable.
Why is std::unreachable a function, and why does it require including a header?
It sounds like the functionality of unreachable is to inform the compiler of something, which is what a core language keyword, like "if" or "for", does. Of course then there could be name collisions with the new keyword, so being in std:: might be a solution for that. But that is inconsistent, sometimes they solve this with prepending/appending __ or _t instead, or using obscure enough names like "constexpr" or "nullptr" that probably don't clash
The problem with the C++23 std::unreachable is that it invokes undefined behaviour. Calling abort (or panic, or whatever D's assert boils down to when the condition fails), would be a prefectly reasonable way to define unreachable. (That is, for example, basically how I define it in my own code:)
Why on earth are these in C++23 and not C++11? There are a ton of things that should’ve been standardized over a decade ago but only show up in C++20 or later. `std::span<T>` is a huge one. The mind boggles.
Trying to get everything is why C++11 wasn't c++07. The committee spend several years trying to polish things instead of doing a new release. (the committee thought they weren't allowed to release anything before 06/07, otherwise some of what was in C++11 could have been in C++01 - or maybe c++03 but with more in it). Eventually you need to say stop right here, what is done is what we will release, what isn't done will have to wait.
Backwards compatibility and a slow moving industry; adding more to a standardization process will only postpone the release.
More and more programming languages are moving to a more lightweight or scheduled release schema though, e.g. Java that had been stuck in limbo for nearly a decade due to design-by-committee and backwards compatibility concerns by major players.
> A typical use case for this function are switch statements on a variable that can take only a limited set of values from its domain. For instance, an integer that can only be between 0 – 9. Here is a simple example with a switch that checks a char value and executes operations. Only a limited number of commands are supported but the argument is checked before invoking the function so it shouldn’t be possible to receive other values than already handled in the switch.
[+] [-] a2800276|3 years ago|reply
1.) Coming from embedded programming, I can see the utility of `std::unreachable`. But shouldn't this be a compiler directive? Or a standardized #pragma? Can someone more knowledgeable in C++ say whether using functions as markers is a common mechanism in std:: ?
2.)Maybe the example is bad here, as it doesn't even save typing. (18 chars for `std::to_underlying` vs. 16 for `static_cast<int>`. The 'old' variant seems more expressive to boot. If anything, how about `static_cast<auto>`? or <underlying> ...
3.) The usefulness of `std::byteswap` to convert data to network byte order seems trivial vs. the venerable old `htonl` family of functions. `std::byteswap` seems more like intrinsics meant to expose possibly present target machine instructions to the user. Like `std::unreachable` this is probably of most use to embedded / low-level programming. This may be a deficiency in the article...
If it's not obvious, I'm not a big C++ fan and read the article with C-tinted glasses :)
[+] [-] s28l|3 years ago|reply
From a compiler perspective, using a function makes the most sense, since that fits into the existing control-flow analysis that the compiler will do. Pragmas are processed by the pre-processor, so they aren't appropriate for expressing control flow hints.
2. `std::to_underlying(t)` is a wrapper around `static_cast<std::underlying_type_t<std::remove_cv_t<std::remove_reference_t<decltype(t)>>>>(t)`. So a lot fewer characters. It's also useful since `std::underlying_type_t<T>` behaves weirdly if `T` is not an enum type.
I think you are maybe missing the context that C++ allows the representation of an enum to be defined, e.g. `enum class X : unsigned char {};` vs `enum class Y : unsigned long long {};`. So you can't always cast to `int`. Technically, this isn't the case in C either: the type defaults to `int`, but the compiler will pick a larger type if necessary, e.g. `enum Z { a = ((long long)INT_MAX) + 1 };`
3. `htonl` are not standardized, so they were not part of the C++ standard library. Also, on Windows, I believe you'd need to include `winsock.h` to get access to them, which has its own idiosyncratic issues. You are also missing the context of C++ defining operator overloading, so you can call `std::byteswap(0ull)` and get an `unsigned long long` and you can call `std::byteswap(std::uint16_t{0})` and get a 16 bit unsigned integer.
[0]: https://stackoverflow.com/questions/60802864/emulating-gccs-...
[+] [-] colanderman|3 years ago|reply
1. Why "should" it be something different? It is semantically part of code flow; making it a pragma breaks that model. This replaces the nonstandard __builtin_unreachable().
2. This doesn't exist to save typing. static_cast<auto> doesn't make sense to me (that reads like a no-op). static_cast<underlying> introduces a new reserved word which is a big no-no.
3. This is not the same as htonl, which only does anything on little-endian machines. (htonl is also a POSIX function, not a C++ function.)
[+] [-] flohofwoe|3 years ago|reply
Probably "committee pragmatism", a stdlib change might be easier to get approved than a language change, and compilers already have builtins for this, they're just not compatible (but the differences can be wrapped in a macro, and since C++ doesn't like to expose macros, it's probably still a macro, but hidden inside a stdlib template).
FWIW it looks like C23 will also just get a macro:
https://thephd.dev/ever-closer-c23-improvements#unreachable-...
[+] [-] planede|3 years ago|reply
2.) static_cast<int> is shorter, if you know that the underlying type is int. But even if you know, you might not want to spell out int, to be more robust to code changes, which might involve a change of the underlying type of the corresponding enum.
In generic context you might not even know the underlying type.
3.) I absolutely agree. I think it was a mistake to include. The included `std::byteswap` can be expressed in terms of `std::ranges::reverse(std::as_writable_bytes(obj))`. It could be quality of implementation detail to get a bswap instruction from the latter.
I would be happy with equivalents of the `htonl` functions in the standard library, but I have strong opinions of the appropriate function signatures of it for C++.
[+] [-] kazinator|3 years ago|reply
Lisp perspective: anything that can be a function almost certainly should be a function (and not a special operator or macro).
std::unreachable() does not have any arguments; therefore it doesn't need any special argument evaluation semantics that would require a compiler built-in.
The way C and C++ work, compiler directives are keywords and not identifiers. Keywords are not namespaced. Introducing a keyword called "unreachable" is problematic; far more so than a new element in the std namespace.
My only problem with std::unreachable is that I would never use it over std::abort.
Nobody needs a function whose only job is to invoke undefined behavior (from which it is then assumed that it is not reached).
It's a good cold day, so I can almost hear the Rust people laughing in the distance.
[+] [-] unknown|3 years ago|reply
[deleted]
[+] [-] chrisseaton|3 years ago|reply
[+] [-] taylorius|3 years ago|reply
[+] [-] yrro|3 years ago|reply
i.e., the functions documented in https://man7.org/linux/man-pages/man3/endian.3.html (why oh why are they not also documented in the GNU C Library Manual...)
[+] [-] Karellen|3 years ago|reply
Maybe having something like convert_be() and convert_le(), one of which is a no-op and the other does the byteswap (depending on your arch) would be better. It removes the duplication of e.g. htobe() and betoh() which are exactly the same function, while allowing the caller to not worry about which architecture their code is compiled on.
[+] [-] Findecanor|3 years ago|reply
The benefits are obvious: Code is declarative, and there's no risk of a bug where you missed to call std::byteswap() or did it twice.
Also, the compiler could automatically extract and insert values from/to little-endian and big-endian bitfields (which can be a handful...), and it could optimise to reduce the number of byteswaps in the code.
[+] [-] stinos|3 years ago|reply
[+] [-] gpderetta|3 years ago|reply
[+] [-] CJefferson|3 years ago|reply
I suspect 99.9% of uses of std::unreachable would be better replaced by abort. (There will be those times when the code is correct and the optimisation gains are worth it -- but they will be rare).
[+] [-] hoseja|3 years ago|reply
Only in the parts specified by the protocol (headers etc). I encourage everyone sending data over network in a novel way to just use little-endian.
[+] [-] halayli|3 years ago|reply
[+] [-] simias|3 years ago|reply
That seems like a glaring footgun to me, to the point where I think I must be missing something.
What I want when dealing with endianess are "from_little_endian/to_little_endian", "from_big_endian/to_big_endian" function pairs that expand to either nop or a byte swap depending on the host architecture.
Exposing the byte swapping directly without this layer on top is asking for trouble because every user will have to make sure that they correctly detect the local endianess before attempting a swap. That's the potentially tricky part, not swapping the bytes.
[+] [-] gpderetta|3 years ago|reply
[+] [-] ryandrake|3 years ago|reply
[+] [-] eco|3 years ago|reply
It was just updated for C++20 (with some coverage of C++23).
There's always a bunch of CppCon talks for this too. Just pop on over to their YouTube channel.
[+] [-] unknown|3 years ago|reply
[deleted]
[+] [-] jahnu|3 years ago|reply
[+] [-] XCSme|3 years ago|reply
[+] [-] pm215|3 years ago|reply
The Arm architecture does call this operation "reverse bytes", though, so it's not universal to call it "swap".
[+] [-] secondcoming|3 years ago|reply
[+] [-] siknad|3 years ago|reply
[+] [-] unknown|3 years ago|reply
[deleted]
[+] [-] Koshkin|3 years ago|reply
[+] [-] kazinator|3 years ago|reply
Code after an abort() call is unreachable. (Or after any function attributed noreturn).
GCC and Clang know this, and do things accordingly.
For instance, I've seen GCC emit code which assumes that ptr is not null after ASSERT(ptr != NULL), because the custom ASSERT macro called an __attribute__((noreturn)) function in the null case.
abort() has defined behavior; it terminates the program abnormally, as if by raising the SIGABRT signal.
Your own function attributed __noreturn__ can have whatever behavior you want it to have. The one in the ASSERT macro I alluded to above calculates and prints a backtrace and other useful information.
[+] [-] donatj|3 years ago|reply
Is a default on a switch required? Is lack thereof a compiler warning or something? It’s been a very long time; I only ever seem to recall that kind of “all paths must be handled” from functional languages.
[+] [-] bluGill|3 years ago|reply
Having an unreachable block communicates to other humans that you didn't forget the else case, it shouldn't exist. Code is about communicating to the next maintainer of the code.
Unreachable communicates to static analyzers, which can throw an error if it detects a code path that would reach this code, even though otherwise that code path is fine. Also static analysis will stop analysis at this point, and since static analysis often is running into the halting problem having a forced halt means some other heuristic elsewhere will get more time to run and so it can find bugs in a different code path that it wouldn't have analyzed before.
> Is a default on a switch required?
Many style guide do not allow a default case on a switch. If you don't have a default case and you add a new item static analysis will flag an error (compiler warning), thus ensuring you look at that section of code that you may not have known about. So unreachable is a way to mark a lot of not possible cases as ones you have thought about, without either skipping them or adding a default.
> I only ever seem to recall that kind of “all paths must be handled” from functional languages.
C++ doesn't require all paths be handled, but realistically as a programmer you want to handle all paths. Marking a path as not reachable is a useful way to handle impossible code paths.
[+] [-] klyrs|3 years ago|reply
[+] [-] beached_whale|3 years ago|reply
[+] [-] gpderetta|3 years ago|reply
[+] [-] Cthulhu_|3 years ago|reply
But I can imagine other use cases that are like "this is not supposed to happen" that can cause major issues like buffer overflows. Better to be more defensive and write code that basically says you are aware of code that shouldn't be reachable.
[+] [-] Koshkin|3 years ago|reply
[+] [-] f1shy|3 years ago|reply
[+] [-] Aardwolf|3 years ago|reply
It sounds like the functionality of unreachable is to inform the compiler of something, which is what a core language keyword, like "if" or "for", does. Of course then there could be name collisions with the new keyword, so being in std:: might be a solution for that. But that is inconsistent, sometimes they solve this with prepending/appending __ or _t instead, or using obscure enough names like "constexpr" or "nullptr" that probably don't clash
[+] [-] secondcoming|3 years ago|reply
By not doing that you still have to do `std::is_enum_v<T>` somewhere.
For types that aren't enums it should do nothing (maybe `std::identity`)
[+] [-] WalterBright|3 years ago|reply
[+] [-] a1369209993|3 years ago|reply
[+] [-] nynx|3 years ago|reply
[+] [-] bluGill|3 years ago|reply
[+] [-] Cthulhu_|3 years ago|reply
More and more programming languages are moving to a more lightweight or scheduled release schema though, e.g. Java that had been stuck in limbo for nearly a decade due to design-by-committee and backwards compatibility concerns by major players.
[+] [-] oreliazz|3 years ago|reply
You can't just have some dude write a span.hpp and go "yep, that's going into my compiler v2" - not how standards work.
[+] [-] gpderetta|3 years ago|reply
[+] [-] unknown|3 years ago|reply
[deleted]
[+] [-] SeanLuke|3 years ago|reply
C++'s Biggest Problem: unbelievable, extraordinary, just mind-blowing levels of complexity.
Solution: add more stuff!
[+] [-] agluszak|3 years ago|reply
This is a bad practice. Parse, don't validate - https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...