Want speed? Pass by value.

[+] edanm|16 years ago|reply

Seeing this article reminds me of all the reasons I don't use C++ any more.

Instead of focusing on the important problems, like code structure, design, or God forbid, the design of the actual product, most of my life as a C++ programmer revolved around learning the mechanics of the language. And then learning the exceptions. And then learning the optimizations. And then learning the intricacies of the STL.

And I'm still not an experienced C++ programmer.

[+] ramy_d|16 years ago|reply

Not sure if i understood, but here's my summary from what i got from this:

In fewer words, the argument is this

  std::vector<std::string> get_names();
  std::vector<std::string> const names = get_names();

Passing by value causes a lot of under-the-hood moving and copying which is slow. but we will learn later why this is actually correct.

  get_names(std::vector<std::string>& out_param );
  std::vector<std::string> names;
  get_names( names );

Passing by reference causes the need for many extra lines of code to be written throughout the code base, no more constants, mutating variables, other crap no one ever told you about when learning about pointers.

The solution to both of these is to use "RValue expressions". RValues are expressions that create anonymous temporary objects.

When defining variables, using RValues allows transferring ownership of, in this case a dynamically allocated string array (vector), from the source vector to the target vector.

When using functions, returns are also anonymous temporary objects, so we transfer the resources from the return value to the target value in the same way as with variables.

Oh wait, the compiler actually takes care of optimizing stuff for you, it's called Return Value Optimization (RVO) and it works like this:

  std::vector<std::string> names = get_names();

Oh shit, isn't this what we wrote as a first example that's expensive slow? yeah well, apparently there's nothing to worry about. use this.

Do pass a function within a function, because then you're passing rvalues as parameters which is unicorn-level magic

  std::vector<std::string> sorted_names2 = sorted( get_names() );

RVO optimizations aren't required by any standard, but "recent versions of every compiler I’ve tested do perform these optimizations today."

Don't pass a variable by reference and then make an explicit copy of its values - that defeats the whole purpose of what we are trying to talk to you about

Guideline: Don’t copy your function arguments. Instead, pass them by value and let the compiler do the copying.

Lesson: don't explicitly copy variable references, just get their values and the compiler will copy optimize for you

[+] jheriko|16 years ago|reply

This is very wrong for so many reasons and misses probably the most important reasons to pass by copy...

The example of the sorted functions is striking - because the compiler should be able to produce the same quality of code from both cases - the fact that it doesn't suggests that the compiler is not that smart. The r-value, l-value distinction is, in this case, unimportant - because of const the l-value can not be modified, as long as you don't use const_cast or forget a volatile keyword etc. elsewhere the compiler can safely assume that it really will remain const. I'll have to play with this myself I guess cos I might be missing something here that looking at bytes would reveal... but it seems that your example should fail to show any difference on a good compiler.

Worse though is that the const-reference and explicit copy is done at all - what possible reason could there be? This is where you should use pass by copy /anyway/, before your argument about speed, because you quite explicitly need to work with a copy. If you chose a suitable example you would see the opposite... unless your compiler is smart enough to treat your redundant copy as if it was a const-reference, which, e.g. the MS C++ compiler does infact do.

The important point I think that was missed though is to always pass small things by value - throwing heap pointers around to be dereferenced is typically much more expensive than using the stack - your compiler might optimise it away for you, but especially if you aren't using const references/pointers it can be difficult. Imagine your compiler is not smart - passing a 4/8-byte value as a reference to a 4/8-byte value is just silly. This is probably the most common case - ints, floats, doubles and even small structs will go on the stack, or better.

[+] grogers|16 years ago|reply

Yes, but in an article on copy elision, it doesn't really make sense to talk about small objects because it really doesn't matter if they are copied or not. Pass them by value anyways.

What are you referring to about the sorted function - that he says that the compiler isn't smart enough to optimize the copy out of a function returning function argument? His argument makes sense to me here, the caller doesn't know anything about the internals of the function, so it has to allocate separate space for the return value and the function argument, leading to at least one copy. With inline functions or whole program analysis (or some type of link time optimization) it should be smart enough to do this.

[+] malkia|16 years ago|reply

So how many things now you have to keep in your head, when you write and read C++ code?

Also, this would probably not be optimized in Debug builds (-O0, -Od, etc.) which means bigger difference between debug/release, which would make debug useless, if more and more of such things were used in release.

Sorry I work in games, and I've seen the horrors of really slow debug, and fast release... To the point where one of the leads say - who needs debug version - just use printfs, or decode what the debugger meant to say (in release).

[+] s3graham|16 years ago|reply

Oh yes. And then there's "release" and "release-final" (because we started putting a few asserts into release after dropping "debug").

And then "release", "release-final", and "release-final-final" because marketing needed some functionality in "release-final" that was went on QA DVDs. :)

[+] jheriko|16 years ago|reply

Try building the performance critical sections with release settings in debug builds and have a seperate "slow debug" version which doesn't. Its not a magic bullet that will make debug as fast as release, but it can usually bring debug framerates to acceptable levels whilst letting you use the debugger on whatever you are working on most of the time (unless you are trying to fix some core engine bug etc...)

[+] stingraycharles|16 years ago|reply

I'm not 100% certain whether I like this. Consider the following concurrent pseudo-code:

  class A {
  public:
  
    void add () {
     // <locking happens here>
     _v += "foo";
    }
  
    std::string const get () {
     // <locking again>
     return _v;
    }
  
  private:
   std::string _v;
  };

Now, if I launch multiple threads that call get () and add () at the same time, normally this would be thread-safe, if locking occurs. However, if I understood it correctly, get () can also return by reference, since it is const.

Wouldn't this create race conditions?

[+] tbrownaw|16 years ago|reply

No.

If your function does "return <some local var>", as in

    Foo xyzzy() {
        Foo ret;
        /* do stuff to ret */
        return ret;
    }

then the compiler can rewrite it to be something like

    void xyzzy(void * _ptr) {
        Foo * _ret = new(_ptr)Foo();
        /* do stuff to *_ret */
    }

where the function-local variable that gets returned is actually just constructed in the place that it would eventually be returned/copied to and so doesn't have to be returned at all.

[+] unknown|16 years ago|reply

[deleted]

[+] zokier|16 years ago|reply

c++ is scary

[+] jheriko|16 years ago|reply

i think its only the hordes of terrible C++ programmers that make it seem that way...

[+] jonsen|16 years ago|reply

Yes. It induces respect. That is not necessarily a bad thing.

Perhaps C++ is for those who bother to learn a language before using it.

[+] zandorg|16 years ago|reply

I wrote my own class which created the data (basically an array) once, passed it around, and automatically deleted the data once its parent function returned.

I did this by having a reference counter which goes up on a constructor, and down on a destructor, and when it gets to zero and destructs, only then does it delete the data.

This gave me about 10% extra speed in the program as a whole, over using the C++ classes in the STL.

[+] czhiddy|16 years ago|reply

http://en.wikipedia.org/wiki/Reference_counting ?

[+] koenigdavidmj|16 years ago|reply

Sounds like the Flyweight design pattern. Basically, if you have a lot of small objects that would take a lot of memory, then you just cache them all.

The canonical use case is a word processor that needs one instance of a letter class for each letter in a document. That's a lot of letters, but there is also a lot of repetition. However, if you have several hundred letters `e', all of the same font face and size, then why not just use the same reference to that identical object and save a lot of space? (You improve your program's spatial locality quite a bit as well.)

[+] d0m|16 years ago|reply

Well, "just" because the pass-by-value example is simpler and clearer, I would use that (screw the speed factor). And only if there's a provable penalty speed on large data, I would consider to try other alternatives. Still, it's cool to know that the "better way" is in fact the "faster way" in same time.

[+] bediger|16 years ago|reply

Isn't his caveat at the end of the article possibly very important? Knowing where copy constructors get called seems like a tricky, yet performance-important thing, given the costliness of cache-flushes and lack of locality of reference C++ objects might incur.

[+] karatchov|16 years ago|reply

Please, anyone can recommend a good tutorial to understand C++ functions arguments ? I'm starting coding in C++ and seeing this article makes me more confused.

[+] ori_b|16 years ago|reply

Just read the ABI specs. A good start is the SysV ABI (refspecs.freestandards.org/elf/IA64-SysV-psABI.pdf), although it doesn't directly cover C++.

Then, the C++ ABI is defined here - specifically, the Itanium ABI draft, which is in fact used on most GCC-supported systems, as far as I know. The "Itanium": is a misnomer. http://www.codesourcery.com/public/cxx-abi/

[+] tbrownaw|16 years ago|reply

This kinda relies on your objects being cheap to copy/move. Try doing your pass-by-value (not return-by-move) with a non-COW container, and see how it works.

[+] okmjuhb|16 years ago|reply

This is totally wrong; the examples rely on compilers doing copy elision together with return value optimization so that the copy becomes unnecessary (even if a copy is expensive, it's fine - it never needs to happen). They do not use copy on write containers (indeed, they return modified versions of input variables).

30 comments