top | item 18323384

A fun optimization trick from rsync

186 points| luu | 7 years ago |blog.plover.com | reply

64 comments

order
[+] vortico|7 years ago|reply
I don't like it. You should put the logic where the logic should be, so it's clear to see what was intended.

    typedef int (*MethodFunc)();

    void doMethod() {
        static int methodId = 0;

        static const MethodFunc methodFuncs[] = {
    #ifdef METHOD_1_CAN_COMPILE
            method1,
    #endif
            method2,
            ...,
            NULL
        };

        while (methodFuncs[methodId]) {
            int success = (methodFuncs[methodId])();
            if (success)
                return 0;
            methodId++;
        }

        return -1;
    }
[+] blux|7 years ago|reply
In case performance is all you are after, this has much less potential for inlining than the rsync implementation has.
[+] stouset|7 years ago|reply
If all of the methodFuncs fail, this will attempt to jump to whatever address happens to be in memory after the last element of methodFuncs. You've forgotten to stop iterating.
[+] matthewaveryusa|7 years ago|reply
That's not clever, that's language abuse and horrific -- You could do this in so many other ways there is absolutely no point in abusing #includes mid-source-code (without mentioning the code smell around statics.)

Now, as a working curiosity, this is pretty damn cool.

[+] nwmcsween|7 years ago|reply
Not really its just a static var on a switch statement nothing really bad about it, in fact its a semi nice way to handle a simple state machine.
[+] imron|7 years ago|reply
> without mentioning the code smell around statics

Or the code smell around fall through switch statements.

There are legitimate uses for fall through switch statements and/or mid-file includes, and or static function vars, but combining all of them together is not clever, it's a recipe for bugs.

[+] ktm5j|7 years ago|reply
Im with you. This is great and clever, but I would have to maintain code like this.
[+] usefulcat|7 years ago|reply
An interesting trick, but it seems like there would be a simpler way, like having set_the_mtime be a function pointer. Initially it would point to an implementation that figures out which method works and sets the function pointer to point to a particular implementation.
[+] beefsack|7 years ago|reply
Sounds really clever, but also has the potential to be really opaque and confusing. On the other hand, the trick in the article is quite verbose.

Can be tricky balancing terseness with readability.

[+] ychen306|7 years ago|reply
It's more inliner-friendly this way though.
[+] corndoge|7 years ago|reply
no guarantee the signatures are the same on all functions, can't use a fp
[+] alecbenzer|7 years ago|reply
Awesome, that's horrifying.

One concrete issue with this technique: it doesn't "loop". Meaning, for example, if initially method #3 doesn't work but method #4 does, it'll settle on method #4. If later, method #3 starts working, it won't switch it. Worse, if method #4 stops working, it'll just always fail.

I imagine this isn't an issue for rsync's use but could be a concern if applying it more generally.

[+] lilyball|7 years ago|reply
According to the article, a step isn't skipped because it had a transient failure, it's skipped because it returns a special error code ENOSYS that means the syscall hasn't even been implemented, and therefore there's no point in ever trying it again.
[+] vortico|7 years ago|reply
Just remove the `static` keyword and there you go.

But it's an "optimization", and avoiding previously attempted methods is the whole point of the trick. Otherwise it would be an easy and uninteresting function.

[+] kelnage|7 years ago|reply
My understanding of the code from the article is that if method #4 stopped working, it would just carry on down the list. The static variable just means where to skip to in the case list.
[+] ecma|7 years ago|reply
Thanks, I hate it. I'm a bit bemused about this bizarro case_N.h trickery. I'm also almost positive __COUNTER__ would be adequate and entirely remove the need for __LINE__ as suggested in the addendum which is likely to be quite large in any non-trivial source unit. I mean, something gcc this way comes I guess...
[+] scatters|7 years ago|reply
__COUNTER__ is not standardized. Repeated inclusion is.
[+] netheril96|7 years ago|reply
So thread unsafe?
[+] romed|7 years ago|reply
Note that if this were C++ (11 or later) this could be done with the static variable in a thread-safe way, since function-scoped static variables initialized by functions are guaranteed to be initialized exactly once. So this code could be something like:

  int set_the_mtime(...) {
    static int starting_from = set_mtime_starting_from(0);
    set_mtime_starting_from(starting_from);
  }
  // Try various ways of setting mtime until it works.
  int set_mtime_starting_from(int starting_from) {
    // horrible switch statement
  }
[+] jbverschoor|7 years ago|reply
So without actually looking into anything. What id you have a directory structure which is built out of 2 mounts. One local fs and one network mounted. Then lets say method 5 works on one, and method 1 works on the other, both are the only ones that work. Then it would basically fail, depending on the order of directories enumerated, eventhough there is a supported method.
[+] de_watcher|7 years ago|reply
I haven't looked at the code, but the ENOSYS error is more about the capabilities of the kernel. So when we eventually start swapping entire kernels at runtime then it'll break.
[+] bacon_waffle|7 years ago|reply
Edit: d'oh - I misread, the addendum is doing exactly this...

I don't understand the need for the macro-fu to generate the case values. The use of switch_step++ in each previous case seems to drive that, but couldn't the same be accomplished by setting switch_step to a constant?

  static int switch_step = 0;
  switch(switch_step) { // Falls through between cases
    default:

    #if METHOD_0_AVAILABLE
      switch_step = 0;
    case 0:
      if (method_0_works(...))
        break;
    #endif METHOD_0_AVAILABLE

    #if METHOD_1_AVAILABLE
      switch_step = 1;
    case 1:
    ...
  }
[+] OskarS|7 years ago|reply
So, slightly off topic, but one thing from C that I really miss in other languages is subroutine-local static variables. It always annoys me in Java or C# that you can't really make a static variable that's scoped to a single method or function, you have to scope them to the entire class. It's common enough that a private static variable is only needed in a single method or function, so why not do it the way C does it?
[+] VMG|7 years ago|reply
In js you can use `this.x` or `this.methodname.x`
[+] Jedd|7 years ago|reply
rsync is an amazing tool, full of delightful secrets.

One thing it doesn't have natively (unless it's really well hidden) is the ability to look at two directory structures, and generate a delta of the two out to a separate location (say a USB stick).

A very useful feature for backups where you've got good enough (to run an rsync --dry-run) network connectivity, but not good enough to actually do the transfer.

[+] rstuart4133|7 years ago|reply
Unless you actually want whole files stored in a directory tree on your USB, --only-write-batch does exactly that. From the man page:

   you can feel free to write the batch directly to some portable media
[+] lazylizard|7 years ago|reply
can rsync --dry-run and | output to a file?
[+] waingake|7 years ago|reply
I'd be interested if someone could show me a FP friendly solution to this problem.
[+] toolslive|7 years ago|reply
Some compilers turn large switches into lookup tables. Using side effects like this however, makes the compiler shy away from this optimisation, and compile it literally. So it's probably not a good idea anymore.
[+] ygra|7 years ago|reply
I'm not sure switch performance is a relevant concern for setting the last modification time of a file.