Three new utility functions in C++23

[+] a2800276|3 years ago|reply

My first impression is that all three of these are clutter to an already very cluttered language ...

1.) Coming from embedded programming, I can see the utility of `std::unreachable`. But shouldn't this be a compiler directive? Or a standardized #pragma? Can someone more knowledgeable in C++ say whether using functions as markers is a common mechanism in std:: ?

2.)Maybe the example is bad here, as it doesn't even save typing. (18 chars for `std::to_underlying` vs. 16 for `static_cast<int>`. The 'old' variant seems more expressive to boot. If anything, how about `static_cast<auto>`? or <underlying> ...

3.) The usefulness of `std::byteswap` to convert data to network byte order seems trivial vs. the venerable old `htonl` family of functions. `std::byteswap` seems more like intrinsics meant to expose possibly present target machine instructions to the user. Like `std::unreachable` this is probably of most use to embedded / low-level programming. This may be a deficiency in the article...

If it's not obvious, I'm not a big C++ fan and read the article with C-tinted glasses :)

[+] s28l|3 years ago|reply

1. It is a compiler defined function (`__builtin_unreachable()`), but the issue is that MSVC doesn't have it, so you need a different implementation per compiler [0]. Plus, if a new compiler shows up (besides MSVC/GCC/LLVM), you'd need to investigate what the correct way to express `__builtin_unreachable` is.

From a compiler perspective, using a function makes the most sense, since that fits into the existing control-flow analysis that the compiler will do. Pragmas are processed by the pre-processor, so they aren't appropriate for expressing control flow hints.

2. `std::to_underlying(t)` is a wrapper around `static_cast<std::underlying_type_t<std::remove_cv_t<std::remove_reference_t<decltype(t)>>>>(t)`. So a lot fewer characters. It's also useful since `std::underlying_type_t<T>` behaves weirdly if `T` is not an enum type.

I think you are maybe missing the context that C++ allows the representation of an enum to be defined, e.g. `enum class X : unsigned char {};` vs `enum class Y : unsigned long long {};`. So you can't always cast to `int`. Technically, this isn't the case in C either: the type defaults to `int`, but the compiler will pick a larger type if necessary, e.g. `enum Z { a = ((long long)INT_MAX) + 1 };`

3. `htonl` are not standardized, so they were not part of the C++ standard library. Also, on Windows, I believe you'd need to include `winsock.h` to get access to them, which has its own idiosyncratic issues. You are also missing the context of C++ defining operator overloading, so you can call `std::byteswap(0ull)` and get an `unsigned long long` and you can call `std::byteswap(std::uint16_t{0})` and get a 16 bit unsigned integer.

[0]: https://stackoverflow.com/questions/60802864/emulating-gccs-...

[+] colanderman|3 years ago|reply

Longtime C, C++, and embedded programmer here.

1. Why "should" it be something different? It is semantically part of code flow; making it a pragma breaks that model. This replaces the nonstandard __builtin_unreachable().

2. This doesn't exist to save typing. static_cast<auto> doesn't make sense to me (that reads like a no-op). static_cast<underlying> introduces a new reserved word which is a big no-no.

3. This is not the same as htonl, which only does anything on little-endian machines. (htonl is also a POSIX function, not a C++ function.)

[+] flohofwoe|3 years ago|reply

> ...`std::unreachable`. But shouldn't this be a compiler directive?

Probably "committee pragmatism", a stdlib change might be easier to get approved than a language change, and compilers already have builtins for this, they're just not compatible (but the differences can be wrapped in a macro, and since C++ doesn't like to expose macros, it's probably still a macro, but hidden inside a stdlib template).

FWIW it looks like C23 will also just get a macro:

https://thephd.dev/ever-closer-c23-improvements#unreachable-...

[+] planede|3 years ago|reply

1.) Functions are fine for this stuff. The compiler can be trusted for its ability to "inline" the language-level `__builtin_unreachable()` or equivalent at the relevant optimization levels.

2.) static_cast<int> is shorter, if you know that the underlying type is int. But even if you know, you might not want to spell out int, to be more robust to code changes, which might involve a change of the underlying type of the corresponding enum.

In generic context you might not even know the underlying type.

3.) I absolutely agree. I think it was a mistake to include. The included `std::byteswap` can be expressed in terms of `std::ranges::reverse(std::as_writable_bytes(obj))`. It could be quality of implementation detail to get a bswap instruction from the latter.

I would be happy with equivalents of the `htonl` functions in the standard library, but I have strong opinions of the appropriate function signatures of it for C++.

[+] kazinator|3 years ago|reply

> But shouldn't this be a compiler directive?

Lisp perspective: anything that can be a function almost certainly should be a function (and not a special operator or macro).

std::unreachable() does not have any arguments; therefore it doesn't need any special argument evaluation semantics that would require a compiler built-in.

The way C and C++ work, compiler directives are keywords and not identifiers. Keywords are not namespaced. Introducing a keyword called "unreachable" is problematic; far more so than a new element in the std namespace.

My only problem with std::unreachable is that I would never use it over std::abort.

Nobody needs a function whose only job is to invoke undefined behavior (from which it is then assumed that it is not reached).

It's a good cold day, so I can almost hear the Rust people laughing in the distance.

[+] unknown|3 years ago|reply

[deleted]

[+] chrisseaton|3 years ago|reply

Is there syntax available in C++ to write some kind of instruction to the compiler which is not some kind of call? Even __builtin_trap is a call isn't it? What else could you attach a directive to?

[+] taylorius|3 years ago|reply

I definitely agree with 1. I find the inclusion of core language features in std:: to be rather disconcerting. I've always thought of std:: as a set of standard useful library functions, separate from actual language features.

[+] yrro|3 years ago|reply

Hmm. How often do people actaully want to std::byteswap as opposed to "convert this value from native byte order to big-endian" or "convert this value from little-endian to native byte order"?

i.e., the functions documented in https://man7.org/linux/man-pages/man3/endian.3.html (why oh why are they not also documented in the GNU C Library Manual...)

[+] Karellen|3 years ago|reply

Yeah, for portable code you also need a function to tell you if you're on an architecture where you need to do a byteswap for the data you have. e.g. you know you have data in little-endian format - do you need to swap it to work with it natively? That depends.

Maybe having something like convert_be() and convert_le(), one of which is a no-op and the other does the byteswap (depending on your arch) would be better. It removes the duplication of e.g. htobe() and betoh() which are exactly the same function, while allowing the caller to not worry about which architecture their code is compiled on.

[+] Findecanor|3 years ago|reply

I've always thought that it should have been possible in C and C++ to declare endian-ness as the property of a member variable's type, and that's it: the compiler would then automatically choose where to swap the byte-order to/from he native byte order.

The benefits are obvious: Code is declarative, and there's no risk of a bug where you missed to call std::byteswap() or did it twice.

Also, the compiler could automatically extract and insert values from/to little-endian and big-endian bitfields (which can be a handful...), and it could optimise to reduce the number of byteswaps in the code.

[+] stinos|3 years ago|reply

I love how over the past decade my own C++ utility library has been continuously shrinking because with each update there are more and more utility functions (like the to_underlying this article mentions) and even complete libraries (like <filesystem>) which replace self-written or 3rd party code.

[+] gpderetta|3 years ago|reply

I always worry as much as anyone else on each new release for the additional complexity ("the committee is out of control!!!!1!!eleven!"), but on each compiler upgrade when I actually get to use the new versions of the standard I'm always pleasantly surprised about all the little low-key quality of life improvements.

[+] CJefferson|3 years ago|reply

I understand why it's there, but I do find it fun that when many people are trying to reduce undefined behaviour in their code, std::unreachable is literally defined as "this is undefined behaviour, use that to optimise".

I suspect 99.9% of uses of std::unreachable would be better replaced by abort. (There will be those times when the code is correct and the optimisation gains are worth it -- but they will be rare).

[+] hoseja|3 years ago|reply

>Network protocols specify big endian for the order of transmission

Only in the parts specified by the protocol (headers etc). I encourage everyone sending data over network in a novel way to just use little-endian.

[+] halayli|3 years ago|reply

That's not a good advice. Only if the sender and receiver are guaranteed to be running on little endian architecture you can make such a claim. A better advice is to always consider the endian-ness when designing protocols and have a strategy to handle it.

[+] simias|3 years ago|reply

>Byte swapping is important when transferring data between system that use different order for the sequence of bytes stores in memory.

That seems like a glaring footgun to me, to the point where I think I must be missing something.

What I want when dealing with endianess are "from_little_endian/to_little_endian", "from_big_endian/to_big_endian" function pairs that expand to either nop or a byte swap depending on the host architecture.

Exposing the byte swapping directly without this layer on top is asking for trouble because every user will have to make sure that they correctly detect the local endianess before attempting a swap. That's the potentially tricky part, not swapping the bytes.

[+] gpderetta|3 years ago|reply

The std::endian enum makes it very easy to find the native endianness.

[+] ryandrake|3 years ago|reply

Kind of off topic, but is there a good "catch up" guide for people who stopped paying attention after C++11? Like a quick summary of just the useful, practical things you'd actually want to use in production, rather than the parts that are only interesting to computer science academics? Most of the "what's new in C++X" guides seem to just dump everything on you at once. I feel like I should get back into C++ but I don't think I really need to care about folding expressions or std::bit_cast.

[+] eco|3 years ago|reply

I found Stroustrop's book A Tour of C++ very useful for catching up with C++14/17. It's relatively short and it's easy to skim through parts you already know well.

It was just updated for C++20 (with some coverage of C++23).

There's always a bunch of CppCon talks for this too. Just pop on over to their YouTube channel.

[+] unknown|3 years ago|reply

[deleted]

[+] jahnu|3 years ago|reply

Herb Sutter’s Effective Modern C++: 42 Specific Ways to Improve Your Use of C++11 and C++14

[+] XCSme|3 years ago|reply

Why is it called "byteswap" and not "bytereverse"? "Swapping" can be done in any number of ways, but only one way to reverse.

[+] pm215|3 years ago|reply

I don't know, but 'swap' seems to have become the standard term for the operation, used in places like the C bswap family of functions and the x86 'bswap' instruction. My (totally unsupported and unresearched) guess is that it became popular as a term when 16-bit architectures were common -- "swap the bytes in a 16 bit value" is unambiguous.

The Arm architecture does call this operation "reverse bytes", though, so it's not universal to call it "swap".

[+] secondcoming|3 years ago|reply

There's an x64 instruction called BSWAP.

[+] siknad|3 years ago|reply

Reversing bytes sounds like reversing their bit orders to me.

[+] unknown|3 years ago|reply

[deleted]

[+] Koshkin|3 years ago|reply

Because the reversal is done by swapping the first byte with the last, etc.

[+] kazinator|3 years ago|reply

No thanks, I'm going to keep calling abort (ANSI C, 1989) to indicate "control stops here":

Code after an abort() call is unreachable. (Or after any function attributed noreturn).

GCC and Clang know this, and do things accordingly.

For instance, I've seen GCC emit code which assumes that ptr is not null after ASSERT(ptr != NULL), because the custom ASSERT macro called an __attribute__((noreturn)) function in the null case.

abort() has defined behavior; it terminates the program abnormally, as if by raising the SIGABRT signal.

Your own function attributed __noreturn__ can have whatever behavior you want it to have. The one in the ASSERT macro I alluded to above calculates and prints a backtrace and other useful information.

[+] donatj|3 years ago|reply

Silly question from someone who hasn’t written C++ in 20 years and only very vaguely remembers it - if a code path is unreachable, why have the path at all?

Is a default on a switch required? Is lack thereof a compiler warning or something? It’s been a very long time; I only ever seem to recall that kind of “all paths must be handled” from functional languages.

[+] bluGill|3 years ago|reply

There are a number of reasons.

Having an unreachable block communicates to other humans that you didn't forget the else case, it shouldn't exist. Code is about communicating to the next maintainer of the code.

Unreachable communicates to static analyzers, which can throw an error if it detects a code path that would reach this code, even though otherwise that code path is fine. Also static analysis will stop analysis at this point, and since static analysis often is running into the halting problem having a forced halt means some other heuristic elsewhere will get more time to run and so it can find bugs in a different code path that it wouldn't have analyzed before.

> Is a default on a switch required?

Many style guide do not allow a default case on a switch. If you don't have a default case and you add a new item static analysis will flag an error (compiler warning), thus ensuring you look at that section of code that you may not have known about. So unreachable is a way to mark a lot of not possible cases as ones you have thought about, without either skipping them or adding a default.

> I only ever seem to recall that kind of “all paths must be handled” from functional languages.

C++ doesn't require all paths be handled, but realistically as a programmer you want to handle all paths. Marking a path as not reachable is a useful way to handle impossible code paths.

[+] klyrs|3 years ago|reply

It comes up from time to time, for example sometimes you'll have a big enum where most of the values can be handled in a straightforward way, and things are weird and complex for the other values. So you write something like

    int process(my_enum x, state_t state) {
        if (x == my_enum::COMPLICATED) return do_stuff(state);
        switch (x) {
           case my_enum::SIMPLE0: return 0;
           case my_enum::SIMPLE1: return 1;
           case my_enum::SIMPLE2: return 2;
        }
    }

and your compiler complains that my_enum::COMPLICATED isn't handled, even though it clearly was. You generally like these warnings, because they do keep you safe, but in this case, you're smarter than the compiler, but it forces you to put something there. With std::unreachable, it will squelch the warning and not emit extra code. For example, you might be tempted to throw an error -- but if nothing else in the function throws, you end up emitting a bunch of unreachable error-handling code that the compiler can't remove.

[+] beached_whale|3 years ago|reply

It can also improve codegen and is a building block of building an assume like macro to convey preconditions to the compiler and optimize potentially on that. The other is at the end of a function that can be guaranteed to never reach that point so that the compiler can know. Its a low level tool and shouldn’t be used without caution

[+] gpderetta|3 years ago|reply

It can help both optimizations and static analysis. For example in the switch case in principle the compiler can avoid doing bound checks on the switch jump table. For static analysis it can help flags paths that can actually happen as erroneous.

[+] Cthulhu_|3 years ago|reply

Not C++ specific, but the example given - a default case that is not expected to ever be hit - is a "cover all bases" thing, to avoid undefined behaviour. Without the default case, what should happen? In the example given, it would just do nothing, but it would do so silently, which could lead to hours of developer time wasted trying to figure out why it doesn't do anything.

But I can imagine other use cases that are like "this is not supposed to happen" that can cause major issues like buffer overflows. Better to be more defensive and write code that basically says you are aware of code that shouldn't be reachable.

[+] Koshkin|3 years ago|reply

‘default’ is there by default anyway. This new feature allows to explicitly state that there isn’t one.

[+] f1shy|3 years ago|reply

I think the examples are not the best...

[+] Aardwolf|3 years ago|reply

Why is std::unreachable a function, and why does it require including a header?

It sounds like the functionality of unreachable is to inform the compiler of something, which is what a core language keyword, like "if" or "for", does. Of course then there could be name collisions with the new keyword, so being in std:: might be a solution for that. But that is inconsistent, sometimes they solve this with prepending/appending __ or _t instead, or using obscure enough names like "constexpr" or "nullptr" that probably don't clash

[+] secondcoming|3 years ago|reply

They missed a trick with `std::to_underlying`. It should work on all types, not just enums.

By not doing that you still have to do `std::is_enum_v<T>` somewhere.

For types that aren't enums it should do nothing (maybe `std::identity`)

[+] WalterBright|3 years ago|reply

In D an unreachable branch can be indicated with:

    assert(0);

which is used frequently in D. This is a bonus from assert() being a builtin to D rather than a macro.

[+] a1369209993|3 years ago|reply

The problem with the C++23 std::unreachable is that it invokes undefined behaviour. Calling abort (or panic, or whatever D's assert boils down to when the condition fails), would be a prefectly reasonable way to define unreachable. (That is, for example, basically how I define it in my own code:)

  #define unreachable die("unreachable code reached")

[+] nynx|3 years ago|reply

Why on earth are these in C++23 and not C++11? There are a ton of things that should’ve been standardized over a decade ago but only show up in C++20 or later. `std::span<T>` is a huge one. The mind boggles.

[+] bluGill|3 years ago|reply

Trying to get everything is why C++11 wasn't c++07. The committee spend several years trying to polish things instead of doing a new release. (the committee thought they weren't allowed to release anything before 06/07, otherwise some of what was in C++11 could have been in C++01 - or maybe c++03 but with more in it). Eventually you need to say stop right here, what is done is what we will release, what isn't done will have to wait.

[+] Cthulhu_|3 years ago|reply

Backwards compatibility and a slow moving industry; adding more to a standardization process will only postpone the release.

More and more programming languages are moving to a more lightweight or scheduled release schema though, e.g. Java that had been stuck in limbo for nearly a decade due to design-by-committee and backwards compatibility concerns by major players.

[+] oreliazz|3 years ago|reply

To get something into C++ someone has to actually write a paper and propose it to the standards committee who has to vote and agree on it.

You can't just have some dude write a span.hpp and go "yep, that's going into my compiler v2" - not how standards work.

[+] gpderetta|3 years ago|reply

It took 10 years to get c++11 out as it was...

[+] unknown|3 years ago|reply

[deleted]

[+] SeanLuke|3 years ago|reply

I think -- I hope? -- we're seeing the death spiral of this language.

C++'s Biggest Problem: unbelievable, extraordinary, just mind-blowing levels of complexity.

Solution: add more stuff!

[+] agluszak|3 years ago|reply

> A typical use case for this function are switch statements on a variable that can take only a limited set of values from its domain. For instance, an integer that can only be between 0 – 9. Here is a simple example with a switch that checks a char value and executes operations. Only a limited number of commands are supported but the argument is checked before invoking the function so it shouldn’t be possible to receive other values than already handled in the switch.

This is a bad practice. Parse, don't validate - https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...

187 comments