top | item 12120752

John Carmack on Inlined Code (2014)

395 points| rinesh | 9 years ago |number-none.com

199 comments

order
[+] dzdt|9 years ago|reply
I have had the pleasure to work with lots of other people's code of varying styles and quality. Nothing is harder to read and understand than code which is deeply nested calls from one little helper (or wrapper) function to another. Nothing is easier to read and understand than code which just flows straight through from top to bottom of a big function.

There are other tradeoffs of code reuse and speed and worst-case-speed and probability of introducing bugs. If you haven't read the article, do, its worth it.

I love that Carmack tries to measure which styles introduce more bugs! Who else does that? Seriously, I would love to see more of that.

[+] lj3|9 years ago|reply
Interestingly, Casey Muratori accidentally demonstrates during one of his Handmade Hero sessions that the compiler won't always be able to optimize certain bits of code that are put in a function as opposed to being inline.

In the video, he inlines a very simple function and his game gets twice as fast for no apparent reason. It's instructive to watch him dive into the generated assembly to figure out why.

https://www.youtube.com/watch?v=B2BFbs0DJzw

[+] imtringued|9 years ago|reply
The compiler probably turned

  for(int I = 0; I < 4; ++I) {
        real32 PixelPx = (real32)(XI + I);
        real32 PixelPy = (real32)Y;
        real32 dx = PixelPx - Origin.x;
        real32 dy = PixelPy - Origin.y;
  
        real32 U = dx*nXAxis.x + dy*nXAxis.y;
        real32 V = dx*nYAxis.x + dy*nYAxis.y;
  
        //rest of the loop
  }
into something like

  real32 PixelPy = (real32)Y;
  real32 dy = PixelPy - Origin.y;
  
  real32 PixelPx = (real32)(XI);
  real32 dx = PixelPx - Origin.x;
  
  real32 U = dx*nXAxis.x + dy*nXAxis.y;
  real32 V = dx*nYAxis.x + dy*nYAxis.y;
  
  for(int I = 0; I < 4; ++I) {
      U += nXAxis.x;
      V += nYAxis.x;
      
      //rest of the loop
  }
PixelPy and dy are not affected by the counter in the loop which means they can safely moved outside the loop.

This also results in the subexpression dynXAxis.y and dynYAxis.y being lifted outside the loop.

Now we've moved half of the code outside the loop but we aren't done yet.

The same can be done with PixelPx and dx, the trick is to then replace dxnXAxis.x with

  (dx + I)*nXAxis.x
Expanding

  (dx + I)*nXAxis.x
yields

  dx*nXAxis.x + I*nXAxis.x
We can now lift the subexpression

  dx*nXAxis.x
out of the loop.

The only thing that is now done in the loop is

  I*nXAxis.x
which can be further simplified to

  U += nXAxis.x
The same happens with nYAxis.x.

EDIT: Sorry for the bad formatting. The markdown parser ate my asterisks so I put things into code blocks which requires a new line each time.

[+] sitkack|9 years ago|reply
TIL, another reason to perform 'defactor inline-method' (with directed feedback of course).
[+] jackmott|9 years ago|reply
Another example to set aside for the next "So you think you are smarter than the compiler" person.
[+] bshimmin|9 years ago|reply
Imagine how fantastic it must be working with someone like Carmack. Sure, the first few code reviews would be fairly traumatic - as you quickly realise just how much faster and generally better he is than you - but I think after a little while you could just relax and try to absorb as much as possible.

I love how everything in these emails is delivered as a calm series of reflections, chronicling with great honesty his own changing opinions over time - nothing is a diktat.

I also found it rather heartening that he makes the same copy/paste mistakes that the rest of us do - how many times have you duplicated a line and put "x" or "width" on both lines..? Seemingly Carmack can actually tell you how many times he's done that!

[+] smegel|9 years ago|reply
> the first few code reviews would be fairly traumatic

Hopefully because he is saying "did you really think vain attempts at premature optimization were going to impress me?".

[+] bluecmd|9 years ago|reply
Honestly, this is why working for someone like Google is so great. All the smart people you can learn from.
[+] bluetomcat|9 years ago|reply
Another perspective in defense of long functions is that they enable you to spot common expressions/statements within the body, for example:

    void long_func(void) {
        ...
        if (player.alive && player.health == 100) {
            ....
        }
        ...
        if (some_other_condition && player.alive && player.health == 100) {
        }
    }
Conventional wisdom says that you should write a function `is_player_untouched` and substitute the composite expressions with function calls, but the code in question can be refactored in a much more straightforward way:

    void long_func(void) {
        ...
        const bool player_untouched = is_player_untouched();

        if (player_untouched) {
            ....
        }
        ...
        if (some_other_condition && player_untouched) {
        }
    }
Had the function body been split into more functions for "clarity", you would be doing duplicate calls to `is_player_untouched()` which go unnoticed because they would be buried deep in the call graph.
[+] rubber_duck|9 years ago|reply
This leads to brittle code - it's easy to determine if is_player_untouched state changes between those two conditions when writing the code but when someone else edits that code it's also easy to introduce something that will break it and the monolithic/big function makes it hard to keep track of assumptions like this and it leads to a lot of code changes in those big functions as well. Even worse this kinds of bugs usually end up being hard to test if you're not covered with unit tests, and games rarely are, and big functions go against testing practices.
[+] Jaruzel|9 years ago|reply
I have nothing to add to this, other than... "OK this guy has clearly worked on the Quake source code..." :)
[+] icebraining|9 years ago|reply
Hopefully, that should be something the compiler could do on its own; assuming the language has appropriate semantics, it could identify the common expression, prove that it's value can't have changed in between, and then do the substitution on its own.

Is there any language where this is implemented, or is the effort too great for the gains?

[+] logfromblammo|9 years ago|reply
If you occasionally inline all the functions and unroll all the loops, you can occasionally find optimizations that even the compiler won't be able to make.

For example, in quaternion-based rotation math, there exists a "sandwich product" where you take the (non-commutative) product of the transform and the input, followed by the product of that result and the conjugate of the transform.

It turns out that several of the embedded multiplication terms cancel out in that double operation, and if you avoid calculating the canceled terms in the first place, you can do a "sandwich product" in about 60% the total floating-point operations as two consecutive product operations.

In the application that used spatial transforms and rotations, the optimized quaternion functions were faster than the 4x4 matrix implementation, whereas the non-optimized quaternion functions were slightly slower. That change alone (adding an optimized sandwich product function) cut maybe 30 minutes off of our longest bulk data processing times.

You would never be able to figure that out from this.

  out = ( rotation * in ) * ( ~rotation );
You have to inline all the operations to find the terms that cancel (or collapse into a scalar multiplication).
[+] apeace|9 years ago|reply
I think there's a big point that's being missed here. Carmack is conflating inlining code with writing functional code. These are different things.

I'd agree that if the majority of your code is mutating state, it makes sense to mash all that together in one place. You want to keep an eye on the dirty stuff.

But on the other hand, inlining pure functions that don't use or mutate any global state doesn't make sense to me. Why is making it "not possible to call the function from other places" a benefit?

How about calling that code from a unit test!

[+] phkahler|9 years ago|reply
>> Why is making it "not possible to call the function from other places" a benefit?

When it's a pure function that's not a problem. When it changes state then you lose track of ordering and such. That's his point, state changes need to be kept in the one big function so you can keep track of them easily.

And of course, almost all interesting software has mutable state. Otherwise you're just doing a computation and looking for a single output.

[+] MyNameIsFred|9 years ago|reply
Carmack addressed this, at least as I read it. One of his concluding guidelines was that it purely functional methods are a virtue and should be made purely functional wherever possible (he phrases it in terms of "const"). He did also day, however, that in his domain non-trivial pure-functional methods rarely apply.
[+] jon-wood|9 years ago|reply
The thing that struck me was Carmack's relentless pursuit of perfection. I can't think of many people who'd describe a single frame of input latency as a cold sweat moment!
[+] dfan|9 years ago|reply
When you are making videogames, a (video) frame of latency is a big deal.

When I worked on Guitar Hero and Rock Band, we worried about sub-frame latency (timing is more important when you're hitting a drum than when you're firing a gun).

[+] bluetomcat|9 years ago|reply
I particularly liked the analogy with hardware where you do an operation unconditionally and eventually inhibit the result. Thinking about this, I would compare software written in that fashion with a well-oiled, smooth engine.
[+] theandrewbailey|9 years ago|reply
John Carmack's version of hell must be extremely laggy for him, but no one else notices.
[+] ctrlrsf|9 years ago|reply
Long functions might read easier but you lose some testing precision. I think recent focus on testing has lead to shorter functions with as little responsibility as possible. When short functions fail a test you have smaller surface area to look for the cause.
[+] leojfc|9 years ago|reply
Doesn’t this imply the need for a new language feature? So that well-defined sections of inline code can be pulled out, initial conditions set in a testing environment, and then executed independently.

I guess this could trip up if the compiler optimisations available when considering all the code at once means that the out-of-context code actually does something different in testing...

[+] virtualized|9 years ago|reply
Game developers don't test, they debug. That's why he draws wrong conclusions like "inline everything manually".
[+] typedef_struct|9 years ago|reply
This is a pet peeve of mine. If you made a block of code into a separate function, I'm assuming it's called from multiple places. Or maybe it used to be. Or will be soon. But still.
[+] sickbeard|9 years ago|reply
It's hard to unit test large functions.
[+] nickm12|9 years ago|reply
I'm surprised at the enthusiasm for really long functions here. I find my experience is just the opposite—I find it much more difficult to read code written that way than when the different sections are broken up into smaller functions.

It is of course essential that the smaller functions be well-named and manage side-effects carefully. That is, they should either be pure functions, or the side effects should be "what the function does", so that readers of the main function don't generally need to read the function's code to understand its side effects.

[+] mark-r|9 years ago|reply
Yes, a long monolithic block is hard to read. But he's suggesting using comments and braces to visually separate it into blocks, so the end result is a happy halfway point between monolithic and broken-down functions.

The intro suggests that he agrees that pure functions are an even better solution.

[+] qwertyuiop924|9 years ago|reply
Well, yes, but in games, you have a lot of global state. You wind up manipulating so much state that you may indeed want this kind of large function so that you can see everything.
[+] vfaronov|9 years ago|reply
A side benefit to having separate, clearly named functions: they often play better with tools.

You can jump to them by name from a completely different part of the code (with some editor support). You see them in the headers of your Git hunks so you don't lose context.

In languages that are heavy on type inference, like Haskell or Python+MyPy, they make for a convenient boundary at which you can assert your types, to help with type errors.

[+] Practicality|9 years ago|reply
I wonder how much of this change is because he can no longer keep track of so much state in his head (or just doesn't want to).

I only say this because I've gone through a similar transition of valuing my mental computation time in the last 20 years of coding :).

The efficiency of inlining is compelling when you code the whole thing at once, in one session. Once you decide to break the work up over multiple sessions, it's too much to keep in your head over multiple days (or weeks).

[+] qwertyuiop924|9 years ago|reply
Although I am a fan of the LISP school of program design (minimal global state, build small functions and macros, make sure they work, and than build more functions and macros on top of that, until you have an abstraction that you can build your app on), Carmack raises some interesting points: If you're handling a lot of global, mutable state, you may want to abstract minimally, so that you can see where that state is mutating, which makes bugs easier to spot.

Not a bad idea.

[+] mgregory22|9 years ago|reply
Maybe the key idea is to put all the global state mutations as closely together as possible, so they can be compared and contrasted as easily as possible.
[+] p0nce|9 years ago|reply
About style A vs B vs C:

Robert C. Martin encourages style B because it reads topdown and replaces comments with names.

[+] anotherhacker|9 years ago|reply
Don't we write code for other programmers first--then for the system?

It seems counter-intuitive but in the long run this mentality best serves the business.

[+] softawre|9 years ago|reply
Depends if you're working on some LOB application with average programmers or if you're writing a game that pushes the limits of modern technology.
[+] sophacles|9 years ago|reply
The argument in the article is actually a "for programmers" argument. Mainly that sometimes it's easier to reason about inline code than deeply chained functions, because you don't necessarily know all the implications of calling that function. Or you don't know the potential starting states when writing the function.
[+] schlipity|9 years ago|reply
>I now strongly encourage explicit loops for everything, and hope the compiler unrolls it properly.

I get why this is a thing. Sometimes an unrolled loop is faster. But if this is really an issue, why isn't there a [UnRoll] modifier or a preprocessor or something that handles that for you?

Something like this:

  for (int i = 0; i < x; i++;) {
    dothing(x[i]);
  }
versus:

  unroll for (int i = 0; i < x; i++;) {
    dothing(x[i]);
  }
Only the compiler / preprocessor would unroll the second one. You have the best of both worlds with a reduced chance of subtle errors.
[+] hellofunk|9 years ago|reply
One of his definitions of a pure function is one that only has by value parameters and doesn't change state. Am I correct in thinking that in C++, the advent of C++11 lambdas allows you to be explicit about this and prevent the compiler from allowing you to accidentally use variables from outside the scope of the function's parameters, by writing lambadas (named, if necessary, like a normal function) with no-capture lists ("[]") which would force you to work in a more pure style. In C++, what other method might help you enforce purity?
[+] pbsd|9 years ago|reply
You can still freely access global variables inside lambdas, with or without captures.
[+] TickleSteve|9 years ago|reply
Correct me if I'm wrong (I only skimmed it) but this is less about not liking inlining than having deterministic/time-bounded performance.

These are two separate/orthogonal issues, I doubt he would turn his nose up at the processor doing less work iff it was also deterministic and had predictable worst-case timing.

[+] bluetomcat|9 years ago|reply
I would sum it up as "reducing variability in code paths", i.e. not causing degraded performance when certain conditions change.
[+] Animats|9 years ago|reply
Both games and low-level real time systems have one big loop executed at a fixed rate. That leads to the architecture Carmack describes.

It's not particularly helpful to a server that's fielding vast numbers of requests of various types.

[+] saynsedit|9 years ago|reply
I think Carmack is conflating FP with good abstractions.

Haskell abstractions are often good because they flow from category theory and there are usually well established mathematical laws associated with them. I'm thinking of the "monad laws" and the "monoid laws."

Mathematicians tend to create abstractions if the abstraction satisfies coherent and provable properties. Programmers tend to be less rigorous about what and how they abstract.

There is nothing about C++ that prevents making good abstractions. It's just the culture of the language. Industry programmers are taught to not duplicate code and to keep functions short but they are not taught the fundamentals of what makes a good abstraction.