top | item 20113559

Should array length be stored into a local variable in C#?

94 points| atomlib | 6 years ago |habr.com | reply

68 comments

order
[+] OskarS|6 years ago|reply
This is good to know, but fairly unsurprising. The interesting case is List<T>. The .Count property is actually a function call, and the value could change during the loop. If you don’t mutate the list, is it smart enough to both inline the function call and hoist the value out as an invariant?
[+] legulere|6 years ago|reply
Ist doesn‘t even need to be a function call: Accessing a class field repeatedly in a tight loop is slower than a local variable.
[+] zamalek|6 years ago|reply
There's no mathematical way for it to know that you won't modify (especially remove). The developer would have to express that by, say, iterating over the internal array directly.
[+] 60654|6 years ago|reply
Nice benchmarking on Microsoft .Net CLR. Looks like the JIT compiler is smart enough to recognize that array.Length is an invariant and hoist it out of the loop, which is awesome for the common use cases!

One nitpick about the title: C# runs on more runtimes than Microsoft .Net CLR, and those may behave very differently. For example: Mono CLR, or Unity's IL2CPP which is an ahead-of-time compiler.

Specifically, I'd expect IL2CPP would not hoist length out, because it would not recognize it as an invariant. (Some great examples of IL2CPP cross compilation are here: https://jacksondunstan.com/articles/4749 )

TLDR: the Microsoft JIT compiler makes the local variable unnecessary, but this is a property of the JIT, not of C#. Developers on non-MS platforms shouldn't assume this.

[+] chrisseaton|6 years ago|reply
> those may behave very differently

They shouldn’t behave any differently should they? There’s a single language spec.

[+] ducttape12|6 years ago|reply
Always accessing array.Length is defensively coding. In the event your array is mutated, always accessing array.Length ensures you won't run into an Index Out of Bounds exception.

Even better is to just avoid accessing the array's length. I almost always use foreach or Linq.

[+] cr0sh|6 years ago|reply
I'm not a C# developer, but this kind of thing seems to permeate almost all languages in one form or another.

Maybe it is just a style and readability thing; or maybe (as suggested elsewhere as well) it is meant to be reused elsewhere in the system, so it is cached in a variable for later use.

Or, it's possible that at one time - maybe early in the early days of .NET - doing it this way was more optimized, and the habit stuck with developers (perhaps they all read the same article in the knowledge base about it?). If that's the case, it's a bit of "premature optimization", but one that doesn't apparently harm anything.

What I do wonder is if certain other changes could change the speed?

At least it might be interesting to see in these trivial cases; I admit that in more complex loops it might not be advisable.

But - for instance, what if rather than iterating thru the array from the 0th element to the length of the array, you instead started from the last element and iterated backwards, until you hit zero? That way, you wouldn't be checking the length of the array, but rather for zero?

The code for such a test might look like:

    public int WithoutVariable() {
        int sum = 0;
        for (int i = array.Length - 1; i > -1; i--) {
            sum += array[i];
        }
        return sum;
    }
I'm not sure that a "with variable" version would make much difference (or sense), but here it is for completeness sake:

    public int WithVariable() {
        int sum = 0;
        int length = array.Length - 1;
        for (int i = length; i > -1; i--) {
            sum += array[i];
        }
        return sum;
    }
Again - I'm not a C# developer - maybe my code is wrong above, but hopefully it gets the idea across.

Would this work better? Would it be faster? What would the JIT compiler create? Maybe it wouldn't be any faster or better than the ForEach examples?

I honestly don't know - but if anybody wants to give it a shot, I'd be curious as to the results...

EDIT: I noticed that I said "checking for zero" - but I modified my code to check for -1 as the boundary; I suppose the check in the loops could be modified to be "i == 0;" instead. I'm not sure if whether doing an "i >= 0;" vs "i == 0;" vs "i > -1;" which is faster - another thing to check, I suppose...

[+] Someone|6 years ago|reply
”what if rather than iterating thru the array from the 0th element to the length of the array, you instead started from the last element and iterated backwards, until you hit zero?”

That used to be common in assembly, as it leads to smaller and faster code on many systems.

See https://stackoverflow.com/questions/2823043/is-it-faster-to-..., which also shows how times have changed, with many answers calling this premature optimization.

[+] duncanawoods|6 years ago|reply
I wouldn't naturally think about summing lists in reverse so it becomes more cognitive effort to understand the code. I've seen plenty of termination condition bugs on reverse iterations that might back that up.

A related thought is how modern code clean-up tools are doing things like reducing if-nesting e.g. turning this:

    if (open) 
    { 
        stuff();
        close();
    }
into this, with early returns:

    if (!open) return;

    stuff();

    close();
My feeling is that the first more naturally represents the idea I have of the behaviour and the second, like your reverse iteration, is an encoded version of that idea. I feel I have to make an extra cognitive step decode and reassemble it to create an idea of the behaviour.

I suspect the further you stray from the natural idea, the harder the code is to read, validate by eye and the more likely errors are to crop in. I'm don't know how subjective this is. I generally don't have a strong feeling about early returns it's just I have been noticing the slightly greater cognitive effort they are causing me compared to the logical chunking that nested-ifs provide.

[+] davemp|6 years ago|reply
`i > -1` isn't quite right. I'm not sure about how C# does implicit casting, but arrays should be indexed by `usize` and comparing signed w/ unsigned is a bug waiting to happen or UB in other languages.
[+] mtVessel|6 years ago|reply
The difference is that the article is investigating using a property in the boundary check. The boundary check is evaluated with every iteration. Your example is only using it in the initialization expression which, by definition, is evaluated only once. This holds true for virtually any language with a for construct.
[+] laurent123456|6 years ago|reply
Saving the array length to a variable is one of those things that inexperienced programmers love to do, thinking it will optimise something.
[+] ajnin|6 years ago|reply
Avoiding doing unnecessary work is something all developers should strive to, experienced or not. Maybe in that case that won't make any difference. But, say, a method call in a loop often sould be done outside the loop. Doing it here but not there leads to inconsistent style. Also depending on the language and compiler that might actually make a difference sometimes. I'm not going to write code that assumes a certain interpreter/compiler behavior.
[+] vips7L|6 years ago|reply
It honestly depends on the compiler.
[+] patsplat|6 years ago|reply
Use Linq and forget about array length.

It's an interesting analysis and all, but why bother when the language has such an elegant collections API.

[+] gameswithgo|6 years ago|reply
Linq is orders of magnitude slower and allocates. It is not always an appropriate choice.
[+] coinerone|6 years ago|reply
At the university, every time i put the array length into a local variable for a loop, i got 2 points deduction on my Homework.
[+] jay_kyburz|6 years ago|reply
Ahh, nice to see some c# without a new line before the braces.
[+] thrower123|6 years ago|reply
That is probably my least favorite thing about C# coding. The preeminent style wastes so much vertical space. K&R braces for me.
[+] cutler|6 years ago|reply
Now if they would only drop Pascal case so that I can distinguish a method from a class I might give it a shot.
[+] germanlee|6 years ago|reply
If I remember correctly, the runtime keeps size of the array in the header of the object along with sync block, etc. If you have VS, you can view the object in memory to see the sync block value, array size value, etc.
[+] suff|6 years ago|reply
It is very possible the reason is not speed, but readability. If you simply named it 'length', then sure, there is no point. If it is given a better, more descriptive name, and then gets used in an equation elsewhere in the code, then it may be very useful because it is easier to read.
[+] JustSomeNobody|6 years ago|reply
something like:

    var widgetLength = widget.Length;

?
[+] l-|6 years ago|reply
TL;DR: Yes & use foreach rather than for to skip array bound checks.
[+] gameswithgo|6 years ago|reply
it will skip the array bounds check as long as myArray.Length is the terminal condition in the for loop, rather than a local variable with the length stored.
[+] yc12340|6 years ago|reply
It is odd, that C# needs foreach loop for bound check elimination.

Java supports this optimization for a wide range of loop types... since Java 7, I think. Normally I would argue, that Java is just ahead of curve, but Android has also gained supports for bounds check elimination in 2014-2015.

Either article does not tell us whole truth or Microsoft JIT is subpar by modern standards.

[+] vardump|6 years ago|reply
Yeah. A slight oops from CLR optimizer, but nothing serious. It seemed to miss a hoisting optimization opportunity in the foreach case.