Mysterious Moving Pointers

userbinator|1 year ago

IMHO this is another case where C++'s hidden layers of complexity hides bugs that would've been obvious in plain C. In fact for this particular use-case I'd probably use indices instead of pointers.

limaoscarjuliet|1 year ago

I remember the C++ ca. 1994 year when I started my career. It was C with Objects back then. And it was great! C++ was better C, it was easy for any C dev to convince him/her to jump to it.

I recently had to work with C++ code and... it is not a happy story anymore:

- Lots of magic like described here

- F**ing templates - for those who like them, did you ever see a C++ core file? Or tried to understand a single symbol?

- Standarization that feels like pulling more Boost into the language, which means more templates. Which makes core files incomprehensible.

It used the be that average dev who knew C could read and work with simple C++. This is no longer true. C++ is no longer a better C.

P.S. Example from recent core file, one line in the stack trace: boost::asio::asio_handler_invoke<boost::asio::detail::binder2<core::AsyncSignalCatcher<2, 15, 17>::waitForSignal<XServer::m_accept_loop(tsr::Data&)::<lambda(spawn::yield_context)>::<lambda(int)> >(XServer::m_accept_loop(tsr::Data&)::<lambda(spawn::yield_context)>::<lambda(int)>&&)::<lambda(const boost::system::error_code&, int)>, boost::system::error_code, int> > (function=...)

quotemstr|1 year ago

In far more cases, zero-cost abstractions make obvious or impossible bugs that would be hard to spot in C programs, e.g. memory lifetime rule violations. And you could make a similar argument that C obscures bugs that would be obvious in assembly. High level languages are a blessing, and programmers who avoid them are only decreasing their productivity and those around them.

The problem the article highlights appears to be an implementation defect: in my libstdc++ test just now, we do, in fact, mark the list as nothrow move constructible. The standard should mandate that std::list be infallibly moveable.

Are we going to indict a whole programming model based on an isolated implementation bug? If so, well, isn't doing that from a "C is better" perspective the galactic black hole calling the kettle black?

    #include <cstdio>
    #include <list>
    #include <type_traits>
    #include <vector>

    struct Node;

    struct Connection {
        Node *from, *to;
    };

    struct Node {
        std::vector<Connection> connections;
    };

    int
    main()
    {
        printf("%d\n", std::is_nothrow_move_constructible_v<std::list<Node>>);
        return 0;
    }

mhh__|1 year ago

On the flip side C projects often end up with things like linked lists and pointers to [it depends] because that lack of abstraction, while obvious, encourages the programmer to take shortcuts (otherwise no time for actual logic)

TillE|1 year ago

Storing a pointer to memory that you did not explicitly allocate is always a red flag, I think. You really need to understand how everything works, and be very careful.

I would default to just using std::unique_ptr<Node> in a situation like this, especially since using std::list suggests performance isn't critical here, so the additional indirection probably doesn't matter.

kentonv|1 year ago

unique_ptr seems inappropriate here since the pointers aren't unique. shared_ptr doesn't even work because it looks like this data structure is representing a graph and would expect to have cycles. Perhaps you could use some sort of weak pointer that gets nulled out when the target object is destroyed, but that would not have fixed the bug here, just replaced segfaults with some more controlled exception or panic.

In fact, full-on garbage collection wouldn't have prevented the bug here, and could arguably have made it worse. The problem is that nodes were unexpectedly being copied and then the originals deleted. With GC, you'd still have the copy, but never delete the originals, so you end up with split brain where there are multiple copies of each node and various pointers point to the wrong ones. That'd be pretty painful to debug!

IMO the language-level problem, if there is one, is that C++ is too willing to copy in cases where you would expect it to move instead. This is, of course, for backwards compatibility with the before-times when there was no move and copying was the only logical thing to do. But I think life would be better these days if all non-trivial copies had to be requested explicitly.

im3w1l|1 year ago

I mean the article explains the original author did take explicit steps to keep the Nodes fixed in memory (std::list), because they knew it could be a problem if the Nodes moved.

They just got tripped up by an obscure language feature that made the Nodes move anyway.

jnwatson|1 year ago

This is a great reminder of the pox that was Microsoft of the early part of the millennium. Besides an allergy to investing in web standards, they were woefully behind in their language support. Their non-adoption of modern C++ standards held client security back for a decade, and arguable held language standards development back.

pjmlp|1 year ago

There is a certain irony complaining about Microsoft, while praising everyone else in regards to C and C++ compilers, as if outside the beloved GCC, in a age where clang did not exist, the other proprietary compilers were an example of perfection.

Apparently the folks didn't learn their lesson with Web standards, given the power they gave Google to transform the Web into ChromeOS.

chaboud|1 year ago

That hardly seems fair.

(Microsoft was doing this to C++ well before the early part of the millennium…)

Edit: and in non-joke fairness, Microsoft has really come a long way on this regard.

jinchengJL|1 year ago

Having only worked with gcc and clang, OP’s code looked completely fine to me and I was baffled why many comments think the code is at fault. Judging by this page [1], I agree this is entirely MSVC’s doing.

[1] http://howardhinnant.github.io/container_summary.html

fransje26|1 year ago

I must be misunderstanding something from this article. With:

struct Node {

    std::vector<Connection> connections;

};

struct Connection {

    Node* from, to;

};

Does this mean that to create the vector of connections, Nodes are created, and references are taken to store in the Connections? And then the Nodes are stored in the list, with std::move()?

I don't understand why you would want to go down that road. Intuitively, I would assume that you are not safe from an object copy somewhere down the line and your graph then comes crashing down like a house of cards. Wouldn't it make more sense to store the nodes as pointers? If you like to live dangerously, something like:

struct Graph {

    std::vector<std::list<Node\*>> nodes;

};

Or better:

struct Graph {

    std::vector<std::list<std::unique_ptr<Node>>> nodes;

};

The later will give you plenty of warnings if you do not copy Nodes around with std::move().

Or less performant, but maybe safer, std::shared_ptr<Node>, together with:

struct Connection {

    std::weak_ptr<Node> from, to;

};

so that you have some check guarantees before access?

camblomquist|1 year ago

I don't think you're misunderstanding, it's a strange choice to make even if the example here loses context to make it easier to make the point. It wasn't code that anyone currently here wrote and because it worked with the old compiler, nobody really touched it. I believe the nodes are pushed into the list before adding connections but I don't think that changes the point you're trying to make. That said, one's intuition is different from another's and I don't think it's unfair to assume that pushing into a vector of lists won't cause every existing list to be copied. For that to be the actual behavior is kind of disgusting. It may make more sense to create pointers here but that's a larger change to deal with and ensure correctness versus just swapping the vector out for another list. I don't claim to like that solution but it seems to me like legacy C++ code in general is fragile enough as it is, the less I have to change to fix bugs the better.

wakawaka28|1 year ago

This is a noob mistake, not a huge mystery. It's not always wrong to store raw pointers to STL container elements, but if you do then you must take care of reallocations.

If you find storing pointers to elements too perilous, you should probably just make a container of pointers instead.

unknown|1 year ago

[deleted]

D13Fd|1 year ago

> This is a noob mistake, not a huge mystery.

The interesting part is why it worked in VS2013 but failed in VS2022.

gsliepen|1 year ago

There is a lesser known cousin to std::vector that doesn't have to move nor copy its elements when adding new elements, and that is std::deque.

vardump|1 year ago

Right. But of course std::deque comes with a cost; iteration is slower and memory footprint is bigger.

olliej|1 year ago

I know C++ the language but not the STL (the overwhelming abundance of UB and total lack of safety make it an anathema), so my question is why the STL allows/requires non-move here copying here dependent on whether an object has a no throw move constructor?

Note I’m not asking about move constructor vs memmove/cpy but rather the use of copy constructor vs move depending on exception behavior? Is it something like prefer no throw copy constructor over “throwing” move?

chaboud|1 year ago

That’s a bit like saying you know C++ but not streams or templates, or C but not floating point operations. It’s probably worth learning STL.

Anyway, the reason to use move instead of copy is for performance. Move constructors are faster because they can leave the source object modified (e.g., take over control of a pointer to deep contents). This falls apart when the move constructor can throw, because the container might be part way through a resize when this happens, leaving the object before the exception modified and the code in an unrecoverable state.

Basically, unless we can be super duper 100% certain we’re going to make it through the operation without throwing an exception, we’re going to copy, leaving the objects in question in an unaltered state, and holding to the promises of the standard.

tialaramex|1 year ago

The overwhelming C++ priority beating both safety and performance, often to the consternation of the performance people, is backwards compatibility with dusty archaic code. If it was written by somebody whose funeral was last century, WG21 thinks it's important that it still compiles in your C++ 23 compiler whenever you get one of those. Not crucial. Not so that they actually defined the language sensibly to avoid compatibility problems, but important enough to trump mere performance or safety concerns.

Last century move didn't exist. The terrible C++ move (which is basically an actual "destructive move" plus a default create step) was invented for C++ 11 which was, as the name suggests, standardised only in 2011.

So back then everybody is using copy assignment semantics. Your compiler might be smart enough, especially for trivial cases, to spot the cheap way to deliver the required semantics, but it might not especially as things get tricky (e.g. a std::vector of std::list) and semantically it's definitely a copy, not a move.

As a result the "non-move" that you're astonished by is how all C++ code last century was written, the semantics you're just assuming as necessary didn't even exist in ISO C++ 98 and it is considered important that such code still works.

kentonv|1 year ago

I think the other replies may have misunderstood your question. I think you are asking:

Why does std::vector<T> require T's move constructor to be noexcept (or else it falls back to copying instead)?

The reason goes something like this:

When std::vector<T> grows, it needs to move or copy all of its elements into a new, larger-capacity array. It would prefer to move them, since that's a lot more efficient than copying (for non-trivial types). But what happens if it moves N elements, and then the move constructor for element N+1 throws an exception? Elements 0-N have been moved away already, so the vector is no longer valid as-is. Should it try to move those elements back to the original array? But what if one of those moves fails?

The C++ standards body decided to sidestep this whole problem by saying that std::vector<T> will refuse to use T's move constructor unless it is declared noexcept, so the above problem can't happen.

In my opinion, this was a huge mistake. Intuitively, everyone expects that when an std::vector<T> grows, it's going to move the elements, not making a ton of copies. Often, these copies result in hidden performance problems. Arguably the author of this post is lucky than in their case, the copies resulted in outright failure, thus revealing the problem.

There seem to be two other possibilities:

* std::vector<T> could simply refuse to compile if the move constructor was not `noexcept`. I think this could have been done in a way that wouldn't have broken existing code, if it had been introduced before move constructors existed in the wild -- unfortunately, that ship has now sailed and this cannot be done now without breaking people.

* std::vector<T> could always use move constructors, even if they are not declared `noexcept`, and simply crash (std::terminate()) in the case that one actually throws. IMO this would be fine and is the best solution. Move constructors almost never actually throw in practice, regardless of whether they are declared as such, because move constructors are almost always just "copy pointer, null out the original". You don't put complex logic in your move constructor. And anyway, C++ already has plenty of precedent for turning poorly-timed exceptions into terminations; why not add another case? But I think it's unlikely the standards committee would change this now.

inetknght|1 year ago

Throw specifications do not change function call binding behavior.

Move constructor and move operator will bind to an R-value reference if the move constructor or move operators are available. Conversely, if those functions which declare to not throw anything do end up throwing something then the result is std::terminate.

The only things that determine whether to use a move or copy is whether the reference is an R-value and whether the source or destination is const.

You can declare a {const R-value move operator (not a constructor) for the left-hand} and/or {const R-value move operator or constructor for the right-hand side} of the argument. But you won't be able to modify anything not marked mutable. You shouldn't do that though: that's a sure way to summon nasal demons. That said, I see it fairly often from less experience engineers, particularly when copy-pasting a copy operator intending to modify it for a move operator.

wffurr|1 year ago

What do you use instead of std::vector, map, unique_ptr, etc?

I have a hard time thinking of C++ and the STL as separate. Even our internal utilities and such tend to be STL-like although often with safer defaults.

quuxplusone|1 year ago

I have a blog post on the topic here: https://quuxplusone.github.io/blog/2022/08/26/vector-pessimi...

The TLDR is: Using `move_if_noexcept` instead of plain old `move` can help you provide the "strong exception guarantee." For what _that_ is, see cppreference: https://en.cppreference.com/w/cpp/language/exceptions#Except...

and/or the paper by Dave Abrahams that introduced the term, "Exception-Safety in Generic Components: Lessons Learned from Specifying Exception-Safety for the C++ Standard Library." https://www.boost.org/community/exception_safety.html

> Is it something like prefer no throw copy constructor over “throwing” move?

Almost. If move won't throw (or if copy isn't possible), we'll move. But given a choice between a throwing move and any kind of copy, we'll prefer copy, because copy is non-destructive of the original data: if something goes wrong, we can roll back to the original data. If the original data's been moved-from, we can't.

mgaunard|1 year ago

UB is a feature; people who keep on fighting it are such a pain.

Regarding your question, nothrow operations are essential to maintaining invariants. And maintaining invariants is how you make code correct in a world where UB exists.

SleepyMyroslav|1 year ago

This is another nice reminder to people who build systems where there is only one language implementation. Implementations always change.

beyondCritics|1 year ago

Don't assume! The basic mantra of software development.

66 comments