top | item 13182726

Godbolt: Enter C, get Assembly

602 points| setra | 9 years ago |godbolt.org | reply

151 comments

order
[+] rzimmerman|9 years ago|reply
Try with gcc 6.2 options -Ofast -march=native:

  int square(int num) {
    int a = 0;
    for (int x = 0; x < num; x+=2) {
      if (!(x % 2)) {
        a += x;
      }
   }
    return a;
  }
All kinds of loop unrolling and vector instructions. Now remove the "!"
[+] harpocrates|9 years ago|reply
For anyone too lazy: https://godbolt.org/g/jvSKCD

I would be much more impressed if I hadn't taken a compilers course. I reckon (god alone knows exactly what GCC does) this is just linear induction variable substitution[1] (so `x` gets replace with `i*2`), then associativity of integer multiplication, then some (probably builtin) rule that `n%n` is always 0. From there, it is pretty straightforward.

Don't get me wrong - the devil is in the details and getting optimizations that are both powerful and only applied when they are valid, and at the right time is difficult as hell. That said, I do expect compilers to be at least this smart.

[1]: https://en.wikipedia.org/wiki/Induction_variable#Induction_v...

[+] bartl|9 years ago|reply
If that function is really supposed to return the square of a number, you took a wrong turn somewhere. Because it says the square of 16 is 56.

Now here's a more efficient algorithm which does produce the currect result:

    int square(int num) {
        int a = 0;
        for (int x = 1, n = num; x <= num; x+=x, n+=n) {
          if( num & x) {
              a += n;
          }
       }
       return a;
    }

It would be nice if this site allowed us to run/step through the code, to see exactly what it is doing.
[+] 0x6c6f6c|9 years ago|reply
Alright maybe I'm stupid but I could have sworn that more than

    square(int):                             # @square(int)
        xor     eax, eax
        ret
is required to perform this, what's going on?
[+] Sean1708|9 years ago|reply

  > Now remove the "!"
To be fair 4.6 and clang also do that, and I suspect so do earlier versions if I could be arsed to fix the compilation error.

Also as pointed out elsewhere clang turns the "!" version into a simple equation in terms of n, so I'm actually kind of disappointed in GCC here.

[+] OskarS|9 years ago|reply
Dude, modern compilers are smart.
[+] cestith|9 years ago|reply
Try the same thing again with -Os and see the difference.
[+] xroche|9 years ago|reply
This site is extremely valuable to produce good quality reports of GCC bugs. For https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67283, I was able to track multiple regressions on a GCC optimization over the different versions of GCC, within few minutes. Doing the same manually would have been extremely tiresome. Kudos to the website authors!
[+] jawilson2|9 years ago|reply
We use this all the time at work. Also, Matt Godbolt sits about 3 rows away from me, which helps as well!
[+] JoshTriplett|9 years ago|reply
One optimization I've always found impressive:

    #include <stdint.h>

    uint32_t testFunction(uint32_t x)
    {
        return ((x & 0xff) << 24) | ((x & 0xff00) << 8) | ((x & 0xff0000) >> 8) | ((x & 0xff000000) >> 24);
    }
compiles into:

    testFunction(unsigned int):
        mov     eax, edi
        bswap   eax
        ret
Another fun one, that only works with clang:

    #include <stdint.h>

    int testFunction(uint64_t x)
    {
        int count;
        for (count = 0; x; count++)
            x &= x - 1;
        return count;
    }
compiles into:

    testFunction(unsigned long):
        popcnt  rax, rdi
        ret
[+] gratilup|9 years ago|reply
As a compiler engineer, those are some of the least impressive things done by modern compilers. Why? Because there isn't anything smart behind them, it's pure pattern matching.
[+] eridius|9 years ago|reply
Impressive? Those are probably specially-coded optimizations rather than the result of generic optimizations.
[+] dangerbird2|9 years ago|reply
The second example makes sense: rdi stores the first subroutine argument in amd64 calling conventions and rax stores the return value. I imagine it's easy for the compiler to infer that the loop is just counting 1 bits, which popcnt does. I imagine if you called functions or initialized local variables in the loop, it would be much more messy
[+] AstralStorm|9 years ago|reply
The latter is generally not faster on most modern chips. Results in similar microcode to non vectoring assembly, but you need to put data in the SSE registers which is not free.

A smart compiler would not use the new SSE version. (perhaps you didn't set the flags right?)

This is why: http://0x80.pl/articles/sse-popcount.html

[+] smitherfield|9 years ago|reply
I like the highlighting – I finally understand why people say compilers generate faster code than a human could write. Pasted in a pretty straightforward and already fairly optimized function I'd written and of course got back something also pretty straightforward, then I put in "-Ofast -march=native" and, wow. Completely rearranged everything in a totally non-linear way and generated all sorts of bizarre (to me) instructions. I think I'd need to study it for months if not years to understand what the compiler did.
[+] jdub|9 years ago|reply
Cool! Certainly quicker than loading up cross compilers (even when they're so easy to get on Debian/Ubuntu), building a binary, and running the right version of objdump.

The Rust Playground at https://play.rust-lang.org/ has a similar function, letting you check ASM, LLVM IR, and MIR (Rust's mid-level intermediate representation) output for current versions of the Rust compiler.

[+] jerven|9 years ago|reply
Really cool, it surprised me to see even trivial code give very different results in gcc,icc and clang.

  int retNum(int num, int num2) {
      return (num * num2)/num;
  }
gives this in clang

  retNum(int, int):     # @retNum(int, int)
        mov     eax, esi
        ret
While icc and gcc give

  retNum(int, int):
        mov     eax, esi
        imul    eax, edi
        cdq
        idiv    edi
        ret

  retNum(int, int):
        imul      esi, edi                                      
        mov       eax, esi                                     
        cdq                                                     
        idiv      edi                                           
        ret
The clang version at first sight seem right. But then thinking about it this is integer math.

  4/3 := 1
1 * 3 := 3 leads to 3 != 4 I believe gcc and icc returning 3 there is correct, while clang returning 4 is not. Maybe someone more C/int versed can tell us which are acceptable (knowing C both might be ok)
[+] comex|9 years ago|reply
The product of 'num' and 'num2' is always going to be a multiple of 'num', so there won't be any error introduced by flooring when dividing by 'num' again.

One thing that can happen is an integer overflow: if you pass (0x10000, 0x10000), icc's and gcc's versions will calculate 0x10000 * 0x10000 = 0, 0 / 0x10000 = 0, while clang will return 0x10000. But clang's not wrong: signed integer overflow is undefined behavior in C, so compilers are allowed to just assume it never happens when making optimizations.

[+] chrisseaton|9 years ago|reply
(num * num2)/num = num2

The division cancels out the multiplication. Just like if you were doing arithmetic on paper as you did in school. There's nothing more to it than that is there? What inputs do you think it's incorrect for in clang?

Overflow is of course undefined.

[+] geofft|9 years ago|reply
Note that "GodBolt" isn't some clever Zeus-inspired service name, "Godbolt" is just this fellow's last name.
[+] AndrewOMartin|9 years ago|reply
But the guy fellow himself may have some divine inspiration, as his work is often next to godliness.
[+] wibr|9 years ago|reply
I know C but no Assembler, so this looks like a good way to get more familiar with what's going on under the hood. It would be really neat if you could click on each instruction/register to get a short summary, though.
[+] cfv|9 years ago|reply
Can anyone please explain why this naive pow() implementation is so freaking huge? I lack the chops to figure this out properly https://godbolt.org/g/YFvNWa
[+] beardog|9 years ago|reply
Is this just a toy or does it have any place at all in real assembly projects? (I don't know assembly besides having tinkered with it a bit)

Cool project regardless.

[+] jdub|9 years ago|reply
It's really useful to see what key pieces of code look like in assembly (though easy enough to do locally), and reaaaaaaaallllly useful to see what they look like across compilers and architectures.

Instead of finding (or worse, building) a compiler or cross compiler locally and doing all the boring steps to compile properly (harder than it sounds) and disassemble the code, you can just splat some code into this and take a peek.

Less about "assembly projects", more about breadth of info available to a developer writing C/C++.

[+] pkaye|9 years ago|reply
When I used to develop firmware, I would write up the critical inner loops code there and look at the assembly output and tweak the functions to result in better assembly code. Sometimes you can express things a little differently to get around a weakness of the compiler.
[+] harpocrates|9 years ago|reply
Useful way to see how different compilers interpret the C standard. Just code up a minimal example and see what assembly they produce.
[+] karyon|9 years ago|reply
i think the primary purpose is to find out what your compiler will do out of your C/C++ program, to see e.g. what optimizations are performed. but who knows, maybe it's also used for actually writing assembly :)
[+] jahnu|9 years ago|reply
I use it to check things like copy elision etc in my C++.
[+] nowne|9 years ago|reply
This is an amazing tool, I use it almost daily. Whenever I want to test an idea, or see what is going on in the assembly, I go straight to godbolt.
[+] source99|9 years ago|reply
What kind of work do you do?
[+] syphilis2|9 years ago|reply
I can't edit the code samples using a mobile Firefox browser. Attempting to delete text and then type new text results in the deleted text reappearing appended to whatever new text I typed.
[+] mattgodbolt|9 years ago|reply
Sadly the mobile support is pretty bad. It's on my list to fix at some point but requires a bunch of changes in the underlying window library (golden-layout.com), plus some work to reconfigure the layout for mobile.
[+] Roboprog|9 years ago|reply
Cool. I missed the target option list on the right the first time I looked at the page.

The closest I've ever come to this back in the day was running (Borland) Turbo Debugger's assembly view after building with Turbo C (w/ or w/out -O...), as either 8086 or 80286 output - "x16". Yeah, that was a while back :-)

[+] flukus|9 years ago|reply
That's really cool. Is it actually compiling in the background? Is this a tool you wrote?
[+] znpy|9 years ago|reply
It uses gcc behind.

I had a chance to read part of the sources and prepare a patched version with a set of toolchains (m68k, amd64, mips). It's nicely written, it's easy to add toolchains as well as to set default compilation options.

[+] pkaye|9 years ago|reply
It is kind of what you get when you run gcc with the -S option.
[+] ndesaulniers|9 years ago|reply
folks interested in doing this locally should play around with objdump