top | item 40887334

(no title)

smnc | 1 year ago

> As an optimization, it is helpful to use a local variable for the reference to the first pointer. Doing so improves the perfomance substantially: C# is not happy when we repeatedly modify a reference. Thus, at the start of the function, you may set byte* mystart = start, use mystart throughout, and then, just before a return, you set start = mystart.

Did not expect this.

discuss

colejohnson66|1 year ago

As any method can throw (AccessViolationException in this case), writes to references must be propagated back to the caller immediately. By copying to a local variable, you avoid the requirement that changes are written back to the caller each time. As the sibling comment suggests, this allows the JIT to store the value in a register, not on the stack. Theoretically, the JIT could scan the lifetime of the reference and see if it escapes a try/catch block, then elide the premature writes, but that’s a lot of plumbing work.

In addition, a reference to a pointer is doubly indirect, so you could possibly be increasing latency on reads, depending on how fast the CPU forwards memory accesses.

Lastly, this isn’t idiomatic C#. Sure, vectors are used, but proper C# would be using spans to ensure no buffer overruns, not this “start and end pointer” thing you see in C. I’m also curious how this compares to SearchValues from .NET 8.

neonsunset|1 year ago

There is nothing particularly unidiomatic here - there are multiple ways to approach vectorization in .NET. Such low-level C# flavour is by definition something most developers would never write (even though it has gotten way more convenient and easier to do).

CoreLib heavily leans on "byref arithmetics" over pointer arithmetics, because this allows you to avoid object pinning and may have marginally better codegen at callsites, but it is by no means a requirement.

The code is fine, and it's not dissimilar to how SearchValues internal text search algorithms are implemented. It could be improved by factoring out platform-specific shuffle calls into a helper method and have the core use only portable method calls, but for the sake of demonstrating the point in the article it's absolutely fine.

utensil4778|1 year ago

I would suspect that the JIT treats a reference as a call to somewhere else in memory, which can have considerable overhead in extremely tight loops. By copying a pointer into a local variable, it may hint to the JIT that you want the pointer in a CPU register with faster access.

Or might be something much more subtle. I know C++ can behave this way, but I don't have in depth experience with C# JIT or x86 assembly.