(no title)
olliej | 1 year ago
v0:
1. if (input not a number)
fallback to C++; else
2. return tagged 0; // Just making sure the numeric check was optimal
v1:
1. As above
2. If integer
convert to float
3. return tagged 0
v2:
1-2. as above
3. If negative
return tagged nan
4. Return tagged 0
v3:
1-3. as above
4. use the sqrt instruction
5. return tagged 0
v4.
1-4. as above
5. move <4> back to an integer register
6. return tagged 0
v5.
1-5. as above
6. tag the result of sqrt
7. return tagged 0
v6.
1-6. as above
7. Actually return/store the result of <6>
Alas I cannot recall whether at this point return values were going into the heap allocated VM call stack, or whether the return was via rax, but that's not the bit that was eye opening to me.I had a benchmark that was something like
for (var i = 0; i < large number; i++)
Math.sqrt(i)
Noting that while I was working on this there was no meaningful control flow analysis, inlining, etc so this was an "effective" benchmark for perf work at the time - it would not be so today.The performance remained "really good" (read fake) until the `v6` version that actually store/returned the result of the work. It was incredibly eye opening to see just how much code could be "executed" before the CPU actually ended up doing any work, and significantly impacted my approach to dealing with codegen in future.
My perspective at the time was "I know there's a significant marshaling cost to calling host code, and I know the hardware sqrt is _very_ fast", so it seemed that it was possible that a 5-10x perf improvement seemed "plausible" to me at the time (because marshaling was legitimately very expensive) - and I can't recall where in the 5-10x range the perf improvement was - but then once the final store/return was done it dropped in perf to only 2x faster. Which was still a big win, but also seeing just how much work the CPU could just avoid doing while trying to build out the code was a significant learning experience.
No comments yet.