Today I was at the Istanbul courthouse third time this year. I attended a trial defending myself to a judge. Then I bore my testimony to a prosecutor about a different case. Both cases were about the free speech platform I own in Turkey. Meanwhile in the world, one of my older posts in Stackoverflow became Hackernews #1 & reddit/programming #1. I wish it was Turkey which made me feel better about myself, not the rest of the world.
The poor compilers do the best with what they have. And by "what they have", I mean the things they can't make assumptions about. Which turns out to be a crapload. Until that happens, very highly-tuned assembly will continue to outperform the best compilers.
Of course, a more typical scenario is hand-tuning a few loops where 99% of the clock cycles occur and letting the compiler take care of the rest.
(Also, I was attempting to send a Morse code message by flashing the vote counter between up and down, but I don't think anyone got it :( )
I used to work on a compiler for an embedded system. One fun trick we used while writing the run-time libraries was to write them in C, then optimise the generated assembly (possibly by rewriting it from scratch), then fix the compiler so it generated that assembly :). Sometimes that did require using (and occasionally adding) new intrinsics.
Most of all, the compiler can't change the layout of your data. Many of the use cases for hand-crafted assembly involve SIMD instructions, which perform several operations in parallel. These vector instructions can be incredibly fast, but their limitations often mean that you need to carefully design the data structures of the whole program around the optimization of one single critical loop.
There are some magic keywords, like const, which definitely help the compiler. Unfortunately these kinds of optimisations are far too numerous, and soon become tedious. After a certain point, more structured languages could allow much more rigid optimisation. These days JIT seems to have taken over though, which isn't really a bad thing.
Cute. Although I think my compiler has different things to say.
Hello, I'm a C compiler that still can't handle C99. I hope you're wearing waterproof clothing because I'm gonna throw up on you. Also, I've been drinking heavily so your C++ code is going to take a while to compile and when it's done, it's gonna smell funny.
I noticed you couldn't optimize my code to use SIMD so I went ahead and used inline assembly. It will probably take another 30 years before you can actually think like a human and perform optimizations like this.
It takes a really good developer with vast knowledge to do optimizations such as using "inline assembly" for stuff like SIMD. And even though there are enough good developers able to do this, the mother of all problems when developing software is managing complexity.
Yes, you can take a subroutine and apply local optimizations on it. Building complex software in assembly that on the whole is better optimized than what a compiler can do is next to impossible.
Speaking of SIMD and stuff like it, there are already optimizations that LLVM is doing, but such optimizations are hard to apply ahead of time because (1) if you want to distribute those binaries easily, then you need to compile for the common denominator (which is less of an issue with LLVM) and (2) your programming language sucks. It's not the compiler's fault, but rather your own fault that you're using a programming language so confusing that inferring intent from your code is next to impossible. How can the compiler know that you're sorting freaking numbers if you're specifying exactly how bits move around in memory while doing so?
If you're speaking about virtual machines though, there are projects out there for .NET or Scala for instance that can recompile/retarget code at runtime to run with SIMD instructions or on your GPUs if you have any. All you need is a virtual machine that runs bytecode and a programming language (slash compiler) that lets you access at runtime the syntax trees of the routines that you want to optimize and that lets you generate new bytecode. So you can easily shove this kind of optimizations in libraries for special-purpose and descriptive DSLs (e.g. LINQ).
Of course, it gets tricky and doing stuff like this at runtime has overhead, but it's better than what 99.99% of developers can do, not to mention that good developers first and foremost ship.
Hello, this is the compiler calling back at you. I am very sorry that I cannot write efficient SIMD code but neither can you. If we can work together, we will outperform our individual selves as a team.
So you go ahead and write clever SIMD code but please use the SIMD intrinsics I can understand, not inline assembler which I cannot do anything with. You are very good in expressing algorithms in a SIMD friendly way and do a decent job in instruction selection.
You, however, are not very good in doing instruction scheduling and register allocation, so let me handle that and we can achieve a result that can keep the CPU pipelines busy. When you make the tiniest change to the program, I can re-do instruction scheduling and register allocation in an instant, when that would take you hours to rewrite the whole algorithm to use different registers.
my point: SIMD intrinsics + a smart C compiler produces a lot better code than a programmer writing assembly. Clang + vector extensions in particular is very good at it.
I have picked up your clever optimized SIMD and decided it was better to change the execution order, because another unit was bored without nothing to do.
Unfortunately the L2 cache seems to have some issues getting all required data for your instructions due to the way all your threads are manipulating the required addresses from multiple cores.
30 years? ICC already does automatic vectorization pretty well. LLVM and GCC have implementations that need more tuning. I bet they'll be solid within 5 years.
You still beat them with inline assembly, but you probably won't accelerate the code vectorwidth times anymore.
> It will probably take another 30 years before you can actually think like a human and perform optimizations like this.
ITYM "It will probably take another 30 years before I use a language that tells you what the required semantics of the code are, meanwhile I'll use my requirements interpretation privileges to relax the semantics in a few places"
Ignoring for a moment that compilers are getting better at automatic vectorization, using a compiler allows you to write highly optimized inline assembly for the (usually) tiny parts of the code where it really matters, and gives you the convenience of a high level language everywhere else.
I noticed that it takes forever to understand inline assembly. I don't foresee myself ever thinking like a computer and reading assembly as well as a high-level language.
This answer purports a myth that compilers are magical black boxes, the sum of millions of hours of intense academic research that "you will never understand".
Replace "compiler" with "computer". Doesn't that make you angry? Answers like these do nothing but prevent people from learning about them.
If you are interested in compilers, here's Edinburgh University's notes from the course "Compiling Techniques", probably a good place to start. Don't let internet tough-guys stop you from learning.
And on the Fourth Day, God proclaimed "Thou shalt have the ability to use inline assembly in thy C/C++ code for performance-critical tasks".
I can think of absolutely zero reason to write an entire program in x86 assembly, let alone any other kind of assembly (GCC spits out some pretty optimized code for my little Atmel MCU)... It's a lot nicer to write everything in a high-level, and then write any performance specifics in inline assembly.
The really cool thing to see is how other newer languages have adopted this scheme (e.g. PyASM for Python, or the ability to edit the instructions for interpreted languages that run in their own VM). And as always, great power comes with great responsibility ;)
Now a days, compiler writers sees zero reason to write inline assembly.
First of all, it throws a wrench in high level optimizer's analysis. Plus, compilers and processors are getting smarter day by day while inline assembly in shipping code becomes more and more untouchable everyday.
Fascinating, while I was reading that, it got 5 upvotes! Mindblowing.
This is a really awesome description too. The only thing I know about compilers is that I implemented one for class (without fancy optimisations) and I am surprised any software works, ever. Compiler are just ... mindbogglingly complex things. Almost as much voodoo dark magic as engineering.
One of the reason software still works is that compilers are (mostly ()) very easy to test. Just just need an input source code, and expected outcome (either compiler error, or the result when running the code). No intermediate state to consider, no concurrency, no test database to set up etc.
Oh, and many serious compilers can compile themselves, so bootstrapping is a pretty good test too.
Another reason is that compilers simply must* work for software development to continue, so people have spend the necessary amount of energy to get implementation and tests "right".
(*) There are always exceptions, like when a compiler auto-parallelizes code and introduces a race condition that is rarely triggered. But those cases are blessedly rare.
What did you write your compiler in? Writing compilers in, say, raw C is indeed a complex endeavour. But using, say, OCaml or Haskell (which is secretly a DSL for compiler writing) should make it much easier to not fail silently.
Anytime I read about the topic of assembly language, I can't help but think of Michael Abrash. For example, check out Chapter 22 [1] from his Graphics Programming Black Book entitled Zenning and the Flexible Mind for a pleasant stroll down Optimization Lane.
You might also enjoy his book entitled The Zen of Assembly Language which features the Zen Timer (a timing tool for performance measuring).
The best comment:
"Thank you compiler, but perhaps if you weren't commenting on StackOverflow, you could get me a drink and play some nice music while you're working?"
Hey, my name is ICC, and I'm one of the most respected compilers in the industry. I also sabotage your code so that it works poorly on AMD CPUs, while making sure that Intel CPUs run my code at full speed. After all, Intel likes to establish market dominance.
Blind trust in the compiler is bad people. Good luck discussing this issue without any Assembly Programers who can fully understand what is going on here.
I have little idea what modern day compilers are doing, or what the CPU, or the operating system is doing for that matter. Often, way too often, compilers fail, hardware fails, operating systems fail, lots of things fail. I am not going to read the millions of lines of code written by other programmers (in f-ing emacs no less) in the any number of differing complex beasts, the compilers. It seems crazy-making to me, that other programmers would create compilers that would use millions of possibilities of optimizing a single line of mine using hundreds of different optimization techniques based on a vast amount of academic research that I won't be spending years getting at. I do feel embarrassed, yes very icky, that I have little to no idea what a three-line loop will be compiled as, but bloat would be my guess. There is risk in going to great lengths of optimization or doing the dirtiest tricks. And if I don't want the compiler to do this, I have no idea how to stop this behavior, nor do I want to invest in the specific knowledge of the nuances any particular compiler. The compiler does not allow me easy access because the compiler itself is an overly complex piece of software written by other programmers. I could care less about how a compiler would make my code would look in assembly, on different processor architectures and different operating systems and in different assembly conventions. Transformation comes with how we as programmers write code, not in compiler-fu.
P.S. Oh, and by the way if I really wasn't using half of the code I wrote, I would throw the source code away.
You seem to be saying that you're at the same time completely clueless about how programs get built and executed and yet you know better that the compiler what needs to be done.
I've found that in the general case Compilers Know Better. It might not be true for very simple and limiter architecture such as small microcontrolers, but modern CPUs are so much complex and "quirky" that most of the time the compiler will beat you.
These days I only use assembly for very low level stuff where I need complete control of the execution flow (dealing with cache invalidation, MMU etc...) or some very specific and aggressive optimisation (like implementing <string.h> in ASM).
But hey, if you want to ditch C and write everything in ASM, be my guest (as long as we don't work together).
> I have little to no idea what a three-line loop will be compiled as, but bloat would be my guess.
This is unfortunately too often true. Some compilers are tuned too much for looking good on artificial benchmarks, in which turning 3-line loops into thousands of instructions sometimes helps, even if it hurts on most real-world code.
> And if I don't want the compiler to do this, I have no idea how to stop this behavior, nor do I want to invest in the specific knowledge of the nuances any particular compiler.
The -O0 option, or its equivalent, is pretty easy to find in many compilers. If you're happy with the performance of your code without all those fancy techniques being applied, feel free to use it. Most people aren't ;-).
> P.S. Oh, and by the way if I really wasn't using half of the code I wrote, I would throw the source code away.
Only if you were aware of it ;-). I wish that compilers would focus a little more on helping me make my code better, rather than so much on magically making things better under the covers.
Hello. I'm an assembly programmer. I used a compiler to generate the majority of code, and can hand-craft any assembly that comes out of it. I understand how compilers auto-generate SIMD instructions can be more easily compiler-generated if I make a "struct of arrays" instead of "an array of structs".
TLDR: Real performance programmers need to understand the assembly a compiler generates if they hope to tune the compiler to generate optimal assembly. Also, GCC -O3 is prone to removing too much code and reordering it, causing memory barrier issues and the like. All multi-threaded programmers need to understand how the compiler generates assembly (ie: by reordering your code), and how it can generate new bugs if you don't use the right compiler flags.
Only questions are closed or locked, I think. And only if they are not very good questions that will just lead to debate or opinions but should stay around out of historical interest. This here is a question that can be answered reasonably but has a single whimsical answer. So no need of closing or locking here.
And even though the tone of that answer is humourous it still is a good answer, explaining why we don't all write Assembly instead of HLLs.
I never noticed that the Stackoverflow js pulls updates for vote tallies in real-time. Browsing this answer while HN is sending lots of traffic there is almost like watching a car odometer.
[+] [-] cobrausn|13 years ago|reply
https://www.facebook.com/sedatk/posts/10151240841812644
Sedat Kapanoglu · 2,372 followers
3 hours ago near Maslak, Istanbul
Today I was at the Istanbul courthouse third time this year. I attended a trial defending myself to a judge. Then I bore my testimony to a prosecutor about a different case. Both cases were about the free speech platform I own in Turkey. Meanwhile in the world, one of my older posts in Stackoverflow became Hackernews #1 & reddit/programming #1. I wish it was Turkey which made me feel better about myself, not the rest of the world.
[+] [-] Xcelerate|13 years ago|reply
Of course, a more typical scenario is hand-tuning a few loops where 99% of the clock cycles occur and letting the compiler take care of the rest.
(Also, I was attempting to send a Morse code message by flashing the vote counter between up and down, but I don't think anyone got it :( )
[+] [-] andrewaylett|13 years ago|reply
[+] [-] codeflo|13 years ago|reply
[+] [-] andremedeiros|13 years ago|reply
[+] [-] timmipetit|13 years ago|reply
[+] [-] keeperofdakeys|13 years ago|reply
[+] [-] gilgoomesh|13 years ago|reply
Hello, I'm a C compiler that still can't handle C99. I hope you're wearing waterproof clothing because I'm gonna throw up on you. Also, I've been drinking heavily so your C++ code is going to take a while to compile and when it's done, it's gonna smell funny.
[+] [-] valdiorn|13 years ago|reply
[+] [-] qompiler|13 years ago|reply
I noticed you couldn't optimize my code to use SIMD so I went ahead and used inline assembly. It will probably take another 30 years before you can actually think like a human and perform optimizations like this.
[+] [-] bad_user|13 years ago|reply
Yes, you can take a subroutine and apply local optimizations on it. Building complex software in assembly that on the whole is better optimized than what a compiler can do is next to impossible.
Speaking of SIMD and stuff like it, there are already optimizations that LLVM is doing, but such optimizations are hard to apply ahead of time because (1) if you want to distribute those binaries easily, then you need to compile for the common denominator (which is less of an issue with LLVM) and (2) your programming language sucks. It's not the compiler's fault, but rather your own fault that you're using a programming language so confusing that inferring intent from your code is next to impossible. How can the compiler know that you're sorting freaking numbers if you're specifying exactly how bits move around in memory while doing so?
If you're speaking about virtual machines though, there are projects out there for .NET or Scala for instance that can recompile/retarget code at runtime to run with SIMD instructions or on your GPUs if you have any. All you need is a virtual machine that runs bytecode and a programming language (slash compiler) that lets you access at runtime the syntax trees of the routines that you want to optimize and that lets you generate new bytecode. So you can easily shove this kind of optimizations in libraries for special-purpose and descriptive DSLs (e.g. LINQ).
Of course, it gets tricky and doing stuff like this at runtime has overhead, but it's better than what 99.99% of developers can do, not to mention that good developers first and foremost ship.
[+] [-] exDM69|13 years ago|reply
So you go ahead and write clever SIMD code but please use the SIMD intrinsics I can understand, not inline assembler which I cannot do anything with. You are very good in expressing algorithms in a SIMD friendly way and do a decent job in instruction selection.
You, however, are not very good in doing instruction scheduling and register allocation, so let me handle that and we can achieve a result that can keep the CPU pipelines busy. When you make the tiniest change to the program, I can re-do instruction scheduling and register allocation in an instant, when that would take you hours to rewrite the whole algorithm to use different registers.
my point: SIMD intrinsics + a smart C compiler produces a lot better code than a programmer writing assembly. Clang + vector extensions in particular is very good at it.
[+] [-] pjmlp|13 years ago|reply
I have picked up your clever optimized SIMD and decided it was better to change the execution order, because another unit was bored without nothing to do.
Unfortunately the L2 cache seems to have some issues getting all required data for your instructions due to the way all your threads are manipulating the required addresses from multiple cores.
[+] [-] friendly_chap|13 years ago|reply
[+] [-] Scaevolus|13 years ago|reply
You still beat them with inline assembly, but you probably won't accelerate the code vectorwidth times anymore.
[+] [-] potkor|13 years ago|reply
ITYM "It will probably take another 30 years before I use a language that tells you what the required semantics of the code are, meanwhile I'll use my requirements interpretation privileges to relax the semantics in a few places"
[+] [-] lucian1900|13 years ago|reply
[+] [-] anonymouz|13 years ago|reply
[+] [-] dbecker|13 years ago|reply
I noticed that it takes forever to understand inline assembly. I don't foresee myself ever thinking like a computer and reading assembly as well as a high-level language.
[+] [-] unknown|13 years ago|reply
[deleted]
[+] [-] oliland|13 years ago|reply
Replace "compiler" with "computer". Doesn't that make you angry? Answers like these do nothing but prevent people from learning about them.
If you are interested in compilers, here's Edinburgh University's notes from the course "Compiling Techniques", probably a good place to start. Don't let internet tough-guys stop you from learning.
http://www.inf.ed.ac.uk/teaching/courses/ct/
[+] [-] Breakthrough|13 years ago|reply
I can think of absolutely zero reason to write an entire program in x86 assembly, let alone any other kind of assembly (GCC spits out some pretty optimized code for my little Atmel MCU)... It's a lot nicer to write everything in a high-level, and then write any performance specifics in inline assembly.
The really cool thing to see is how other newer languages have adopted this scheme (e.g. PyASM for Python, or the ability to edit the instructions for interpreted languages that run in their own VM). And as always, great power comes with great responsibility ;)
[+] [-] ohwp|13 years ago|reply
I can think of 2 reasons: study and fun.
[+] [-] drp4929|13 years ago|reply
First of all, it throws a wrench in high level optimizer's analysis. Plus, compilers and processors are getting smarter day by day while inline assembly in shipping code becomes more and more untouchable everyday.
[+] [-] ck2|13 years ago|reply
Looks like IE is holding it back as usual:
https://developer.mozilla.org/en-US/docs/WebSockets#Browser_...
[+] [-] adieulot|13 years ago|reply
[+] [-] gdalgas|13 years ago|reply
[+] [-] ygra|13 years ago|reply
[+] [-] ck2|13 years ago|reply
His work on spinrite is legendary, for those born before IDE hard drives were invented.
[+] [-] Swizec|13 years ago|reply
This is a really awesome description too. The only thing I know about compilers is that I implemented one for class (without fancy optimisations) and I am surprised any software works, ever. Compiler are just ... mindbogglingly complex things. Almost as much voodoo dark magic as engineering.
[+] [-] perlgeek|13 years ago|reply
Oh, and many serious compilers can compile themselves, so bootstrapping is a pretty good test too.
Another reason is that compilers simply must* work for software development to continue, so people have spend the necessary amount of energy to get implementation and tests "right".
(*) There are always exceptions, like when a compiler auto-parallelizes code and introduces a race condition that is rarely triggered. But those cases are blessedly rare.
[+] [-] eru|13 years ago|reply
[+] [-] Arjuna|13 years ago|reply
You might also enjoy his book entitled The Zen of Assembly Language which features the Zen Timer (a timing tool for performance measuring).
[1] http://downloads.gamedev.net/pdf/gpbb/gpbb22.pdf
[+] [-] davidroberts|13 years ago|reply
[+] [-] friendly_chap|13 years ago|reply
[+] [-] dragontamer|13 years ago|reply
http://www.agner.org/optimize/blog/read.php?i=49#49
Blind trust in the compiler is bad people. Good luck discussing this issue without any Assembly Programers who can fully understand what is going on here.
[+] [-] mmphosis|13 years ago|reply
I have little idea what modern day compilers are doing, or what the CPU, or the operating system is doing for that matter. Often, way too often, compilers fail, hardware fails, operating systems fail, lots of things fail. I am not going to read the millions of lines of code written by other programmers (in f-ing emacs no less) in the any number of differing complex beasts, the compilers. It seems crazy-making to me, that other programmers would create compilers that would use millions of possibilities of optimizing a single line of mine using hundreds of different optimization techniques based on a vast amount of academic research that I won't be spending years getting at. I do feel embarrassed, yes very icky, that I have little to no idea what a three-line loop will be compiled as, but bloat would be my guess. There is risk in going to great lengths of optimization or doing the dirtiest tricks. And if I don't want the compiler to do this, I have no idea how to stop this behavior, nor do I want to invest in the specific knowledge of the nuances any particular compiler. The compiler does not allow me easy access because the compiler itself is an overly complex piece of software written by other programmers. I could care less about how a compiler would make my code would look in assembly, on different processor architectures and different operating systems and in different assembly conventions. Transformation comes with how we as programmers write code, not in compiler-fu.
P.S. Oh, and by the way if I really wasn't using half of the code I wrote, I would throw the source code away.
[+] [-] simias|13 years ago|reply
I've found that in the general case Compilers Know Better. It might not be true for very simple and limiter architecture such as small microcontrolers, but modern CPUs are so much complex and "quirky" that most of the time the compiler will beat you.
These days I only use assembly for very low level stuff where I need complete control of the execution flow (dealing with cache invalidation, MMU etc...) or some very specific and aggressive optimisation (like implementing <string.h> in ASM).
But hey, if you want to ditch C and write everything in ASM, be my guest (as long as we don't work together).
[+] [-] cliffbean|13 years ago|reply
This is unfortunately too often true. Some compilers are tuned too much for looking good on artificial benchmarks, in which turning 3-line loops into thousands of instructions sometimes helps, even if it hurts on most real-world code.
> And if I don't want the compiler to do this, I have no idea how to stop this behavior, nor do I want to invest in the specific knowledge of the nuances any particular compiler.
The -O0 option, or its equivalent, is pretty easy to find in many compilers. If you're happy with the performance of your code without all those fancy techniques being applied, feel free to use it. Most people aren't ;-).
> P.S. Oh, and by the way if I really wasn't using half of the code I wrote, I would throw the source code away.
Only if you were aware of it ;-). I wish that compilers would focus a little more on helping me make my code better, rather than so much on magically making things better under the covers.
[+] [-] johncoltrane|13 years ago|reply
[+] [-] dragontamer|13 years ago|reply
TLDR: Real performance programmers need to understand the assembly a compiler generates if they hope to tune the compiler to generate optimal assembly. Also, GCC -O3 is prone to removing too much code and reordering it, causing memory barrier issues and the like. All multi-threaded programmers need to understand how the compiler generates assembly (ie: by reordering your code), and how it can generate new bugs if you don't use the right compiler flags.
[+] [-] yen223|13 years ago|reply
[+] [-] ygra|13 years ago|reply
And even though the tone of that answer is humourous it still is a good answer, explaining why we don't all write Assembly instead of HLLs.
[+] [-] sonergonul|13 years ago|reply
[+] [-] Achshar|13 years ago|reply
[+] [-] orangethirty|13 years ago|reply
[+] [-] eru|13 years ago|reply
[+] [-] raverbashing|13 years ago|reply
You can easily go creative on this
[+] [-] jordanwallwork|13 years ago|reply
[+] [-] kaffeinecoma|13 years ago|reply