I would be much more impressed if I hadn't taken a compilers course. I reckon (god alone knows exactly what GCC does) this is just linear induction variable substitution[1] (so `x` gets replace with `i*2`), then associativity of integer multiplication, then some (probably builtin) rule that `n%n` is always 0. From there, it is pretty straightforward.
Don't get me wrong - the devil is in the details and getting optimizations that are both powerful and only applied when they are valid, and at the right time is difficult as hell. That said, I do expect compilers to be at least this smart.
This site is extremely valuable to produce good quality reports of GCC bugs. For https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67283, I was able to track multiple regressions on a GCC optimization over the different versions of GCC, within few minutes. Doing the same manually would have been extremely tiresome. Kudos to the website authors!
As a compiler engineer, those are some of the least impressive things done by modern compilers. Why? Because there isn't anything smart behind them, it's pure pattern matching.
The second example makes sense: rdi stores the first subroutine argument in amd64 calling conventions and rax stores the return value. I imagine it's easy for the compiler to infer that the loop is just counting 1 bits, which popcnt does. I imagine if you called functions or initialized local variables in the loop, it would be much more messy
The latter is generally not faster on most modern chips. Results in similar microcode to non vectoring assembly, but you need to put data in the SSE registers which is not free.
A smart compiler would not use the new SSE version. (perhaps you didn't set the flags right?)
I like the highlighting – I finally understand why people say compilers generate faster code than a human could write. Pasted in a pretty straightforward and already fairly optimized function I'd written and of course got back something also pretty straightforward, then I put in "-Ofast -march=native" and, wow. Completely rearranged everything in a totally non-linear way and generated all sorts of bizarre (to me) instructions. I think I'd need to study it for months if not years to understand what the compiler did.
Cool! Certainly quicker than loading up cross compilers (even when they're so easy to get on Debian/Ubuntu), building a binary, and running the right version of objdump.
The Rust Playground at https://play.rust-lang.org/ has a similar function, letting you check ASM, LLVM IR, and MIR (Rust's mid-level intermediate representation) output for current versions of the Rust compiler.
Really cool, it surprised me to see even trivial code give very different results in gcc,icc and clang.
int retNum(int num, int num2) {
return (num * num2)/num;
}
gives this in clang
retNum(int, int): # @retNum(int, int)
mov eax, esi
ret
While icc and gcc give
retNum(int, int):
mov eax, esi
imul eax, edi
cdq
idiv edi
ret
retNum(int, int):
imul esi, edi
mov eax, esi
cdq
idiv edi
ret
The clang version at first sight seem right. But then thinking about it this is integer math.
4/3 := 1
1 * 3 := 3
leads to
3 != 4
I believe gcc and icc returning 3 there is correct, while clang returning 4 is not. Maybe someone more C/int versed
can tell us which are acceptable (knowing C both might be ok)
The product of 'num' and 'num2' is always going to be a multiple of 'num', so there won't be any error introduced by flooring when dividing by 'num' again.
One thing that can happen is an integer overflow: if you pass (0x10000, 0x10000), icc's and gcc's versions will calculate 0x10000 * 0x10000 = 0, 0 / 0x10000 = 0, while clang will return 0x10000. But clang's not wrong: signed integer overflow is undefined behavior in C, so compilers are allowed to just assume it never happens when making optimizations.
The division cancels out the multiplication. Just like if you were doing arithmetic on paper as you did in school. There's nothing more to it than that is there? What inputs do you think it's incorrect for in clang?
I know C but no Assembler, so this looks like a good way to get more familiar with what's going on under the hood. It would be really neat if you could click on each instruction/register to get a short summary, though.
Can anyone please explain why this naive pow() implementation is so freaking huge? I lack the chops to figure this out properly https://godbolt.org/g/YFvNWa
It's really useful to see what key pieces of code look like in assembly (though easy enough to do locally), and reaaaaaaaallllly useful to see what they look like across compilers and architectures.
Instead of finding (or worse, building) a compiler or cross compiler locally and doing all the boring steps to compile properly (harder than it sounds) and disassemble the code, you can just splat some code into this and take a peek.
Less about "assembly projects", more about breadth of info available to a developer writing C/C++.
When I used to develop firmware, I would write up the critical inner loops code there and look at the assembly output and tweak the functions to result in better assembly code. Sometimes you can express things a little differently to get around a weakness of the compiler.
i think the primary purpose is to find out what your compiler will do out of your C/C++ program, to see e.g. what optimizations are performed. but who knows, maybe it's also used for actually writing assembly :)
I can't edit the code samples using a mobile Firefox browser. Attempting to delete text and then type new text results in the deleted text reappearing appended to whatever new text I typed.
Sadly the mobile support is pretty bad. It's on my list to fix at some point but requires a bunch of changes in the underlying window library (golden-layout.com), plus some work to reconfigure the layout for mobile.
Cool. I missed the target option list on the right the first time I looked at the page.
The closest I've ever come to this back in the day was running (Borland) Turbo Debugger's assembly view after building with Turbo C (w/ or w/out -O...), as either 8086 or 80286 output - "x16". Yeah, that was a while back :-)
I had a chance to read part of the sources and prepare a patched version with a set of toolchains (m68k, amd64, mips). It's nicely written, it's easy to add toolchains as well as to set default compilation options.
[+] [-] rzimmerman|9 years ago|reply
[+] [-] harpocrates|9 years ago|reply
I would be much more impressed if I hadn't taken a compilers course. I reckon (god alone knows exactly what GCC does) this is just linear induction variable substitution[1] (so `x` gets replace with `i*2`), then associativity of integer multiplication, then some (probably builtin) rule that `n%n` is always 0. From there, it is pretty straightforward.
Don't get me wrong - the devil is in the details and getting optimizations that are both powerful and only applied when they are valid, and at the right time is difficult as hell. That said, I do expect compilers to be at least this smart.
[1]: https://en.wikipedia.org/wiki/Induction_variable#Induction_v...
[+] [-] bartl|9 years ago|reply
Now here's a more efficient algorithm which does produce the currect result:
It would be nice if this site allowed us to run/step through the code, to see exactly what it is doing.[+] [-] im3w1l|9 years ago|reply
https://godbolt.org/g/HCifdP
[+] [-] 0x6c6f6c|9 years ago|reply
[+] [-] Sean1708|9 years ago|reply
Also as pointed out elsewhere clang turns the "!" version into a simple equation in terms of n, so I'm actually kind of disappointed in GCC here.
[+] [-] OskarS|9 years ago|reply
[+] [-] joelthelion|9 years ago|reply
https://godbolt.org/g/eXLhxH
[+] [-] cestith|9 years ago|reply
[+] [-] xroche|9 years ago|reply
[+] [-] jawilson2|9 years ago|reply
[+] [-] JoshTriplett|9 years ago|reply
[+] [-] gratilup|9 years ago|reply
[+] [-] eridius|9 years ago|reply
[+] [-] dangerbird2|9 years ago|reply
[+] [-] AstralStorm|9 years ago|reply
A smart compiler would not use the new SSE version. (perhaps you didn't set the flags right?)
This is why: http://0x80.pl/articles/sse-popcount.html
[+] [-] samlittlewood|9 years ago|reply
http://xania.org/201609/how-compiler-explorer-runs-on-amazon
as a pragmatic example (incl. all tools & configs) of how to build an auto scaling & deploying site, without overdosing on kool-aid.
[+] [-] smitherfield|9 years ago|reply
[+] [-] jdub|9 years ago|reply
The Rust Playground at https://play.rust-lang.org/ has a similar function, letting you check ASM, LLVM IR, and MIR (Rust's mid-level intermediate representation) output for current versions of the Rust compiler.
[+] [-] jerven|9 years ago|reply
[+] [-] comex|9 years ago|reply
One thing that can happen is an integer overflow: if you pass (0x10000, 0x10000), icc's and gcc's versions will calculate 0x10000 * 0x10000 = 0, 0 / 0x10000 = 0, while clang will return 0x10000. But clang's not wrong: signed integer overflow is undefined behavior in C, so compilers are allowed to just assume it never happens when making optimizations.
[+] [-] chrisseaton|9 years ago|reply
The division cancels out the multiplication. Just like if you were doing arithmetic on paper as you did in school. There's nothing more to it than that is there? What inputs do you think it's incorrect for in clang?
Overflow is of course undefined.
[+] [-] geofft|9 years ago|reply
[+] [-] AndrewOMartin|9 years ago|reply
[+] [-] wibr|9 years ago|reply
[+] [-] pjmlp|9 years ago|reply
Also this is actually a repost from one year ago,
https://news.ycombinator.com/item?id=11671730
[+] [-] suprjami|9 years ago|reply
[+] [-] it|9 years ago|reply
[+] [-] asymmetric|9 years ago|reply
[+] [-] 0x4a42|9 years ago|reply
[+] [-] cfv|9 years ago|reply
[+] [-] zitterbewegung|9 years ago|reply
https://en.wikipedia.org/wiki/Duff's_device
[+] [-] beardog|9 years ago|reply
Cool project regardless.
[+] [-] jdub|9 years ago|reply
Instead of finding (or worse, building) a compiler or cross compiler locally and doing all the boring steps to compile properly (harder than it sounds) and disassemble the code, you can just splat some code into this and take a peek.
Less about "assembly projects", more about breadth of info available to a developer writing C/C++.
[+] [-] pkaye|9 years ago|reply
[+] [-] harpocrates|9 years ago|reply
[+] [-] karyon|9 years ago|reply
[+] [-] jahnu|9 years ago|reply
[+] [-] nowne|9 years ago|reply
[+] [-] source99|9 years ago|reply
[+] [-] syphilis2|9 years ago|reply
[+] [-] mattgodbolt|9 years ago|reply
[+] [-] Roboprog|9 years ago|reply
The closest I've ever come to this back in the day was running (Borland) Turbo Debugger's assembly view after building with Turbo C (w/ or w/out -O...), as either 8086 or 80286 output - "x16". Yeah, that was a while back :-)
[+] [-] flukus|9 years ago|reply
[+] [-] pvg|9 years ago|reply
https://github.com/mattgodbolt/gcc-explorer
[+] [-] DvdGiessen|9 years ago|reply
http://xania.org/201609/how-compiler-explorer-runs-on-amazon
[+] [-] znpy|9 years ago|reply
I had a chance to read part of the sources and prepare a patched version with a set of toolchains (m68k, amd64, mips). It's nicely written, it's easy to add toolchains as well as to set default compilation options.
[+] [-] pkaye|9 years ago|reply
[+] [-] unknown|9 years ago|reply
[deleted]
[+] [-] ndesaulniers|9 years ago|reply