top | item 20402733

Catching use-after-move C++ bugs with Clang's consumed annotations

175 points| awesomekling | 6 years ago |awesomekling.github.io | reply

188 comments

order
[+] matthewbauer|6 years ago|reply
Perhaps the future of software isn't "rewrite everything in Rust", but instead we end up annotating existing C/C++ with borrow checking information. This would be similar to how there is a push in JavaScript to add TypeScript annotations everywhere. It gives you the flexibility of the existing language and ecosystem while still allowing you to use new techniques in programming languages.
[+] rubber_duck|6 years ago|reply
My problem with C++ isn't the lack of borrow checker - this is the feature I like the least in Rust (I know it's their core design goal but frankly the inconvenience and limitations it imposes don't seem worth it for my use case, and then theres the compile times).

C++ lack of modules and package management on the other hand is a huge PITA and I'm not optimistic either of those bolted on so late in to the language lifecycle will provide a useful solution.

It's a pity D took it too far in the other direction with GC and runtime - I really could use a C with classes and modules.

[+] wyldfire|6 years ago|reply
In fact, Sutter (et al) are working on lifetimes in CppCoreGuidelines [1]. I built their clang tree and tried it out without bothering to RTFM and tried out the warning on a pile of C++ code. I naively assumed that it might be a generally-useful warning ("-Wlifetime") that's not ready to be introduced upstream. That's not the case AFAICT. What I suppose I would've learned from RTFM is that the profile specified by the guidelines is sorta like an opt-in 'dialect' to annotate/specify lifetime information. Without it, there's lots of spurious findings. Either that, or the codebase I tried it on isn't as good as I thought it was.

Here's a couple of interesting examples of failure modes -Wlifetime can detect on godbolt[2][3].

I watched a video [5] a while back on the Guidelines Support Library (GSL) [4] and it seemed like a really interesting concept. I think it's a valuable idea and I'd love to see popular C++ projects leveraging it.

I'm a card-carrying RESF member† (but have a day job w/mostly C++). Don't RIIR for one thing, RIIR to get all the things. Cargo is the sleeper hit of Rust. Hygienic macros and more!

[1] https://github.com/isocpp/CppCoreGuidelines/blob/master/docs...

[2] https://godbolt.org/z/dymV_C

[3] https://godbolt.org/z/_midIP

[4] http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#...

[5] CppCon 2017: Kate Gregory “10 Core Guidelines You Need to Start Using Now” https://www.youtube.com/watch?v=XkDEzfpdcSg

† Rust Evangelism Strike Force -- maybe we really should have cards

[+] arka2147483647|6 years ago|reply
The value borrow checker in Rust comes from the systemic ability of to eliminate an entire class of bugs.

The value these kinds of smart Cpp/Compiler features comes from the ability to eliminate some instances of bugs.

Which is great, and all, but I don't see that all of the great masses of Cpp code and libraries would be rewritten or annotated to use these new features.

Which will sadly leave the impact of these developments particial.

[+] swsieber|6 years ago|reply
I think we'll c/c++ increasingly use those (in addition to the other wonderful tooling it has). I think Rust still has a strong future though - it has safety by default, and I really like it apart from the borrow checker.

That said, I like this push for more programmatic checks.

[+] akling|6 years ago|reply
I certainly hope that's where we're headed. The more we can tell the compiler about our code, the better. It might also be interesting to provide hints like "this int should only ever be in the range 1-30" :)
[+] billylindeman|6 years ago|reply
As someone who just re-wrote his project (from python) in rust I will say that it has been an incredibly rewarding and pleasant experience.

C++ is fine, but most projects are riddled with ugly macros and #defines for features / platform specific code. Rust solves this in a fairly elegant way. Also, not having a package ecosystem in C/C++ is frustrating as hell.

Cargo was the thing that pushed me over the hump to learn rust instead of just opting for c++, and it has paid off.

It's not quite as ergonomic as go, but overall it has won me over and is my new favorite language. I'm excited to see how the story for rust plays out over the next 5 years :)

[+] pcwalton|6 years ago|reply
C and C++ cannot add borrow checking soundly while remaining any semblance of compatibility with the ecosystem. Consider what you would do with global variables, just to name one of many, many issues.
[+] Ar-Curunir|6 years ago|reply
The plus points of Rust aren't just borrow checking, but also the removal of sons of cruft and the benefit of modern tooling and language design.
[+] arcticbull|6 years ago|reply
This is especially frustrating as the whole concept of moved values in C++ was introduced fairly recently in C++11. They did such a poor job of it that they introduced this whole new class of use-after-move bugs that should never have existed in the first place. Now we need annotations to make sure the new feature they half-assed a few years ago works the way it was supposed to? It appears the C++ working group is firing out new bugs faster than third-party teams can patch them.

IMO it's time to accept C++ is a failed state, and move on. Luckily there are compatibility options to help you use C/C++ libraries in Rust.

[+] _pmf_|6 years ago|reply
Evolutionary improvement of mature tools instead of jumping on bandwagons is not very street, yo. Greenfield development or GTFO.
[+] sudeepj|6 years ago|reply
With C++ even if your project follows this, there is no way to enforce this across your project's dependencies & wider eco-system.

With Rust, its entire ecosystem (however nascent) is subjected to the same strict rules. This is big plus when Rust eco-system matures functionality wise.

[+] coldtea|6 years ago|reply
On the other hand the Rust ecosystem is insignificant (size and adoption wise) compared to the C++ ecosystem, so there's that...
[+] tfha|6 years ago|reply
Sure you can. Just reject any dependencies that don't pass your own linter. By switching to rust, you are doing the same thing anyway.
[+] nonbirithm|6 years ago|reply
That's one of the benefits of designing the language such that using it normally forces people to share valuable information about programs without them knowing it.

A somewhat related example is how Emacs was built to support introspective programming, where nearly anything can be inspected to provide a docstring. This is both invaluable when extending the editor and only possible because Emacs established conventions early on about documenting public objects. Though it isn't obligatory to document things, the audience Emacs appeals to seemed to keep doing so through force of momentum over the years. I find this momentum incredibly important to have.

The problem is you can't just shoehorn this way of thinking onto any arbitrary language/programming environment, because the issue of dependencies following whatever code annotations they have arises. For C++ it's hopeless to imagine we can expect these annotation benefits to be universal because using the language didn't obligate adding them in some way, so the vast majority of people didn't. The author mentions there aren't even any real-world codebases using them. Every corner of the system/language has to be designed from the beginning with annotation in mind.

I've been pining over the same issue as I'm trying to design a "living system" that supports user extension. I'm still wondering if I'm missing anything that could help with introspection, is not detrimental to user experience and can only be added in the nascent stage the project is in. Once all the conventions are in place, everything will have to be built on top of them no matter how imperfect they are.

[+] haberman|6 years ago|reply
Why isn't a moved-from object always considered "consumed"?
[+] hedora|6 years ago|reply
std::move is lower level than that. It is a primitive that you can use to implement whatever lifecycle semantics you want.

To see what I mean, in the examples in the article, the author has to tell the compiler all sorts of things that are hard to infer. For example, the object comes into being in the “unconsumed” state (what if I instantiate it with a nullptr? Can I even do that?), and a prerequisite to dereferencing it is that it is “unconsumed”.

It is plausible that there would someday be a set(foo&&) method that had “unconsumed, consumed, unknown” as a prerequisite, and always brought the type into the “unconsumed” state.

In practice, you shouldn’t be spraying these annotations all over a code base. They should be in library methods that are reused frequently, so that you get a lot of static analysis benefit from a small number of manual annotations.

[+] jdsully|6 years ago|reply
Because its still a legal and valid object according to the standard.

A moved from vector might get reused to store new things for example.

[+] ori_b|6 years ago|reply
Because move is just a cast to an rvalue reference. It doesn't do anything. It just gives type level hooks for operator overloading.
[+] mannykannot|6 years ago|reply
Semantically, it should be, but, on account of the way C++ does things, at some point it is very likely to be destroyed, and whenever that happens, you want it to be in a state such that the running of its destructor has no consequences for the rest of the program.
[+] innot|6 years ago|reply
> Once you std::move an object and pass it to someone who takes an rvalue-reference, your local object may very well end up in an invalid state.

As far as I remember, move constructors/assignments must leave the moved-from object in a valid state - just that the standard doesn't say anything about what that state is for standard classes.

Also, I have seen code where some kind of "result" can be moved from an object and then re-generated from scratch with different inputs. In that case it was perfectly valid to use it after move. But that's nitpicking, anyways.

[+] hermitdev|6 years ago|reply
Yes, valid, but unspecified. Typically what I've seen is that a move operation is effectively a swap between the source and destination. I've also seen where a move leaves the source in effectively a default constructed state.
[+] foota|6 years ago|reply
I believe the standard says that move does "valid but unspecified" for standard library objects, but does not generally guarantee that moved from objects must be valid.
[+] olliej|6 years ago|reply
Valid for the purpose of calling the destructor. That’s the only requirement.

The actual semantic state of the object may not make sense.

[+] saagarjha|6 years ago|reply
Looks like a lightweight borrow checker, although I wonder how well it fares in places where lifetimes are difficult to track. Or is there a way to annotate methods with this information as well?
[+] cesarb|6 years ago|reply
To me, it looked more like Rust's move semantics: in Rust, when an object is moved it's "consumed" and cannot be used anymore. The borrow checker is for when the object is not "consumed", only temporarily borrowed by some other code.
[+] twic|6 years ago|reply
Or maybe a heavyweight one. I note that the clang annotations refer to "typestate", which was a mechanism that existed in primordial Rust, but which was dropped because it was unnecessarily complicated:

http://pcwalton.github.io/2012/12/26/typestate-is-dead.html

Although to be fair, clang's version is a highly limited particular case of typestate, with one axis of state with three values, rather than a general heavyweight typestate mechanism.

[+] rsp1984|6 years ago|reply
I've been writing C++ for 21 years now (started when I was 14). To be honest, I have never seen a solid case where move semantics provided added value (in terms of code readability and maintainability) over just passing object references as function parameters.

That big ugly object that would get copied on function return -- just create it before the function call and pass it in as a reference! No copy required.

[+] d1zzy|6 years ago|reply
Then it sounds like you haven't worked much in system development with many classes that have identity semantics (encapsulate system resources like processes, locks, mutexes, threads, etc) trying to write highly performant code while being typesafe (turn invariant violations into compile time errors). If you did you'd find out that using identity semantics objects is a PITA compared to value semantics ones.

For example, how do you cleanly create a factory function? A pretty simple thing. You'd return a pointer to a dynamically allocated object? But how do you guarantee that the caller doesn't just discard that return value or doesn't forget to delete it? Also this forces dynamic allocation for that object and adds an indirection to access, even when the caller might not want that.

Move semantics allow you to either make your resource wrapping objects movable (so you return them "by value" as value semantics objects but they get moved) or to use something like std::unique_ptr to wrap the returned dynamically allocated object. The former has the advantage that it gives the caller complete flexibility where to store the object (ex. it can store it locally on the stack or as a member) and avoids a pointer indirection.

Similar issues exist for producing copyable but expensive to copy objects (ex. containers). Move semantics allows for a typesafe/clean way to return them from factory functions while not having to worry about lifetime and performance.

[+] jcranmer|6 years ago|reply
C++ has a few major flaws with respect to move and copy semantics. The biggest one is that copy semantics are default and silent: it requires less work to copy something than it does to use it by reference, and there is no visual indication if the value is being copied or accessed via reference. This means that it is way too easy to accidentally copy large objects (such as std::vector) without realizing you're copying them.

Most newer languages have realized that implicit copy semantics is usually a bad thing, and duplicating objects requires explicitly saying that you're duplicating them (such as calling a .clone() method). Of course, some types are value types, where copy is cheap and better than reference, but such mechanisms are opt-in. C++ automatically generates these mechanisms, and gives you nasty error messages when you opt-out of default copy.

Move semantics are almost always better, but in C++, with its historical baggage of opt-out-of-implicit-copy-semantics, it means that constructing move-only types requires a lot of excessive calls to std::move. Compilers do a good job of telling you when you put one too many calls to std::move in, but the code is definitely verbose compared to C++, to the point that it tends to strongly weigh against actually using C++'s ability to annotate move-only methods. Furthermore, without something like the mechanism in the blogpost here, compilers don't give any indication of API misuse, so you can't leverage move-only types to construct safe-by-construction APIs.

This is something I've been tripping over a lot recently, as I have a type system where calling most methods makes the original object unusable.

[+] badamp|6 years ago|reply
> I've been writing C++ for 21 years now (started when I was 14).

So? As far as I know this could be 1 year+ 20 repeats.

In contrast to your experience, I missed move semantics since the early 2000s. It’s not just to avoid copies.

How about the ability to avoid pointers, aliasing, and allocation. Move semantics afford more than just avoiding a copy.

[+] kllrnohj|6 years ago|reply
std::vector<std::unique_ptr<SomeThing>>

That works thanks entirely to move semantics. Do you dispute the usefulness of such a container?

> That big ugly object that would get copied on function return -- just create it before the function call and pass it in as a reference! No copy required.

Or just return it without a std::move and it still won't be copied. In fact 'return std::move(any_local_value)' is not just silly it's bordering on wrong as it defeats some RVO optimizations (and returning a T&& of a local is actually wrong as it's returning a reference to a local). That's not what the purpose or use is of move semantics at all.

[+] DerDangDerDang|6 years ago|reply
References mean the optimizer cannot rule out aliasing, which is kind of a big deal.
[+] namirez|6 years ago|reply
Move semantics is the language support for shallow copies. If you never use shallow copies, fine! But a lot of code bases use shallow copies, and it's a valuable tool.
[+] ska|6 years ago|reply
You can structure your code so that you don't need it - sure. But that doesn't mean that there aren't use cases where this makes managing lifetimes much clearer.

RVO can often sort out your object return problem just fine. But lifetimes are more subtle. Sometime move semantics can significantly clean up your overall design.

[+] hermitdev|6 years ago|reply
You see the ROI when your big ugly object has a bunch of heap allocated data. You won't see any benefit when the object has everything stored by value.
[+] pjmlp|6 years ago|reply
That doesn't work for composable functions or fluent API designs.
[+] BubRoss|6 years ago|reply
Moving out of functions simplifies dependencies and arguments of calling a function. Ownership control simplifies architecture.

All of eigen was originally based on template expressions, which are very complex. That technique / hack is not necessary if you move data structures. You no longer have to do elaborate work around to avoid copying temporaries.