That's an incredible find and once I saw the assembly I was right along with them on the debug path. Interestingly it doesn't need to be assembly for this to work, it's just that that's where the split was. The IR could've done it, it just doesn't for very good reasons. So another win for being able to read arm assembly.
Unsure if this would be another way to do it but to save an instruction at the cost of a memory access you could push then pop the stack size maybe? Since presumably you're doing that pair of moves on function entry and exit. I'm not really sure what the garbage collector is looking for so maybe that doesn't work, but I'd be interested to hear some takes on it
You would normally use the “LDR Rd, =expr” pseudo-instruction form [1]. For immediates not directly constructible, it puts a copy of the immediate value in a PC-relative memory location, then does a PC-relative load into register.
So that would turn the whole sequence of “add constant to SP” into 2 executable instructions, 1 for constructing immediate and 1 for adding for a total of 8 bytes, and a 4 byte data area for the 17-bit immediate for a total of 12 bytes of binary which is 3 executable instructions worth.
I'm a little surprised that this bug wasn't fixed in the assembler as a special case for immediate adds to RSP. If the patch was to the compiler only, other instances of the bug could be lurking out there in aarch64 assembly code.
I think the right fix is that the compiler should, e.g. load the constant into a register using two moves and then emit a single add. It's one more instruction, but then the adjustment is atomic (i.e. a single instruction). Another option is to do the arithmetic in a temp register and then move it back.
One thing I worry about, probably unnecessarily, is anything with a sense of urgency.
HEY GUYS WE JUST FOUND A GOLANG COMPILER BUG AND FATAL PANICS!
Everyone is like “Hmm. I need to fix this now.”
So, 99% probability it’s what it is. 1% it’s some secret defensive thing because there was a bad stupid zero day someone would get fired over or that could leave the world in shambles if uncovered, or maybe something else needed to be swept under the rug, or maybe someone wants to distract while they introduce a new vulnerability.
I don’t think this with CVEs, but when someone’s like “install this patch everybody!” the dim red light flickers on.
One thing that often gets missed is how hard it is to even suspect the compiler as the root cause. Most engineers waste hours chasing bugs in their own code because we’re trained to trust our tools. This mindset alone can make these rare compiler bugs much trickier to find.
In the early PC days we suspected them a lot given how manually writting Assembly was still much better, in many cases.
I found out a bug on Turbo Pascal 6, where if you declare a variable with the same name as the function name, then the result was random garbage.
For those that don't know Pascal, the function name has to be assigned for the result value, so if a local variable with the same name is possible, then you cannot set the return value.
(* In Turbo Pascal 6 this would compile *)
function Square(num: Integer): Integer;
var
Square: Integer;
begin
Square := num * num; (* Here the local variable gets used instead *)
end;
In the past it was more common to suspect the compiler, as others mention here.
On a minicomputer I worked with in the late eighties, early nineties, I occasionally found errors in the compiler output. This was a Pascal compiler and because of that it didn't take too long to figure out that the code was actually correct and something else must be going on. Then firing up the debugger/tracer and scrutinizing and analyzing what happens in the disassembly.. when the problem was found, send a fax (yes!) to the head designer of the compiler, get a fixed test compiler back on a set of floppies.. went through this several times. I still have a printout somewhere with my pen marks pointing out a bug in the generated code.
Great technical blog. Good pathway for narrative, tight examples, description so clear it makes me feel smarter than I am because so easy to follow though the last time I even read assembly seriously was x86 years ago.
Also, fulfills the marketing objective because I cannot help but think that this team is a bunch of hotshots who have the skill to do this on demand and the quality discipline to chase down rare issues.
I assume these are Ampere Altra? I was considering some of those for web servers to fill out my rack (more space than power) but ended up just going higher on power and using Epyc.
This problem strikes me more as a debuginfo generation bug than a "compiler" bug.
> After this change, stacks larger than 1<<12 will build the offset in a temporary register and then add that to rsp in a single, indivisible opcode. A goroutine can be preempted before or after the stack pointer modification, but never during. This means that the stack pointer is always valid and there is no race condition.
Seems silly to pessimize the runtime, even slightly, to account for the partial register construction. DWARF bytecode ought to be powerful enough to express the calculations needed for restoring the true stack pointer if we're between immediate adjustments.
> This problem strikes me more as a debuginfo generation bug than a "compiler" bug.
But isn't that the same thing here? The bug occurred in their production workflows, not in some specific debug builds, so with that seems pretty reasonable to call it a compiler bug?
What ARM64 machines are you using and what are they used for? Last year you were announcing Gen 12 servers on AMD EPYC (https://blog.cloudflare.com/gen-12-servers/), but IIRC there weren’t any mentions of ARM64. But now it seems you’re running ARM64 in full production.
I wonder if Go had a mode where you make it single step every instruction and trigger a GC interrupt on every opcode. That would make it easier to find these kinds of bugs.
As an aside, this is the type of a problem that I think model checkers can't help with. You can write perfect and complicated TLA+/Lean/FizzBee models and even if somehow these models can generate code for you from your correct models you can still run into bugs like these due to platform/compiler/language issues. But, thankfully, such bugs are rare.
I doubt any company is mono language at that scale. Using ARM usually makes sense for s lot of horizontal scaling workloads so it's also not that surprising.
I guess those that wrote the preemption were on X86 where this doesn't happen thanks to variable length instructions being able to hold the constant and thus relied on the code-gen to do it atomically, then the ARM port had an automatic "split" from a higher level to make things "easy" thus giving us this bug.
Excellent article as always from the cloudflare blog - engineering without magic infrastructure and ml. One day I will apply !
Compiler bugs are actually quite common ( I used to find several a year in gcc ), but as the author says, some of them only appear when you work at a very large scale, and most people never dive that far.
I see something like this and I wonder "what testing methodology would have found this?" It has to be general, not something that would involve knowing what the bug was ahead of time.
When your scale is large enough, you move to "what monitoring methodology will find this?"
When you're doing enough transactions you start to see a noise floor of e.g. bit flips from cosmic rays, and looking for issues involves correlating/categorizing possible software failures and distinguishing them from the misbehavior of hardware.
An x86-64 Windows 11 machine trying to access a previously available Website now always produces a Cloudflare "obsolete protocol" error on the ordinary attempt. Al browsers get the same error. Did your fix break something?
I always appreciate articles like this, where you can clearly see the engineer’s way of thinking.
I was just puzzled by the middle part of the article, where they start investigating their code but seem to overlook the fact that it only happens on ARM64.
Still, I understand that it’s professional to proceed step by step logically.
Linus's law [1]? When it comes to compilers for mainstream languages, the userbases are so large that they will explore a surprisingly large portion of the compiler's state space.
But definitely, better engineering and QA practices must also help here.
Seemed like a red herring. They were able to reproduce it without any libraries. Might have just been net link forcing the stacks to a certain size and that made the bug visible.
[+] [-] Neywiny|5 months ago|reply
Unsure if this would be another way to do it but to save an instruction at the cost of a memory access you could push then pop the stack size maybe? Since presumably you're doing that pair of moves on function entry and exit. I'm not really sure what the garbage collector is looking for so maybe that doesn't work, but I'd be interested to hear some takes on it
[+] [-] Veserv|5 months ago|reply
So that would turn the whole sequence of “add constant to SP” into 2 executable instructions, 1 for constructing immediate and 1 for adding for a total of 8 bytes, and a 4 byte data area for the 17-bit immediate for a total of 12 bytes of binary which is 3 executable instructions worth.
[1] https://developer.arm.com/documentation/dui0801/l/A64-Data-T...
[+] [-] pklausler|5 months ago|reply
[+] [-] bloak|5 months ago|reply
Yes, though that weird stuff with dollars in it is not normal AArch64 assembly!
The article could have mentioned the "stack moves once" rule.
[+] [-] pjmlp|5 months ago|reply
[+] [-] titzer|5 months ago|reply
[+] [-] huflungdung|5 months ago|reply
[deleted]
[+] [-] pengaru|5 months ago|reply
[+] [-] cmckn|5 months ago|reply
Does the Go team have a natural language bot or is this just comment.contains(“backport”) type stuff?
[+] [-] chavi2|5 months ago|reply
HEY GUYS WE JUST FOUND A GOLANG COMPILER BUG AND FATAL PANICS!
Everyone is like “Hmm. I need to fix this now.”
So, 99% probability it’s what it is. 1% it’s some secret defensive thing because there was a bad stupid zero day someone would get fired over or that could leave the world in shambles if uncovered, or maybe something else needed to be swept under the rug, or maybe someone wants to distract while they introduce a new vulnerability.
I don’t think this with CVEs, but when someone’s like “install this patch everybody!” the dim red light flickers on.
[+] [-] Vipsy|5 months ago|reply
[+] [-] pjmlp|5 months ago|reply
I found out a bug on Turbo Pascal 6, where if you declare a variable with the same name as the function name, then the result was random garbage.
For those that don't know Pascal, the function name has to be assigned for the result value, so if a local variable with the same name is possible, then you cannot set the return value.
Something like this https://godbolt.org/z/s6srhTW66
[+] [-] Tor3|5 months ago|reply
[+] [-] SuperQue|5 months ago|reply
The reporter actually spent the effort to track it down, turns out it _was_ a Go compiler bug. (https://github.com/golang/go/issues/20427)
[+] [-] unknown|5 months ago|reply
[deleted]
[+] [-] kmarc|5 months ago|reply
In the HFT sphere i haven't talked to a company that hasn't reported (bragged about finding) a super weird gcc/clang bug.
Well, also, at my last job we used a snapshot version of the compiler, bc... Any nanoseconds matters.
[+] [-] renewiltord|5 months ago|reply
Also, fulfills the marketing objective because I cannot help but think that this team is a bunch of hotshots who have the skill to do this on demand and the quality discipline to chase down rare issues.
I assume these are Ampere Altra? I was considering some of those for web servers to fill out my rack (more space than power) but ended up just going higher on power and using Epyc.
[+] [-] quotemstr|5 months ago|reply
> After this change, stacks larger than 1<<12 will build the offset in a temporary register and then add that to rsp in a single, indivisible opcode. A goroutine can be preempted before or after the stack pointer modification, but never during. This means that the stack pointer is always valid and there is no race condition.
Seems silly to pessimize the runtime, even slightly, to account for the partial register construction. DWARF bytecode ought to be powerful enough to express the calculations needed for restoring the true stack pointer if we're between immediate adjustments.
[+] [-] sauercrowd|5 months ago|reply
But isn't that the same thing here? The bug occurred in their production workflows, not in some specific debug builds, so with that seems pretty reasonable to call it a compiler bug?
[+] [-] riobard|5 months ago|reply
[+] [-] zamadatix|5 months ago|reply
[+] [-] EE84M3i|5 months ago|reply
[+] [-] MarkSweep|5 months ago|reply
[+] [-] defleopold|5 months ago|reply
[deleted]
[+] [-] wy1981|5 months ago|reply
As an aside, this is the type of a problem that I think model checkers can't help with. You can write perfect and complicated TLA+/Lean/FizzBee models and even if somehow these models can generate code for you from your correct models you can still run into bugs like these due to platform/compiler/language issues. But, thankfully, such bugs are rare.
[+] [-] alberth|5 months ago|reply
Interesting to hear Go & ARM in use.
[+] [-] surajrmal|5 months ago|reply
[+] [-] steveklabnik|5 months ago|reply
And yeah, a lot of Rust but also a lot of Go.
[+] [-] dreamcompiler|5 months ago|reply
[+] [-] whizzter|5 months ago|reply
Nobodys fault really, but bad results ensued.
[+] [-] drob518|5 months ago|reply
[+] [-] brcmthrowaway|5 months ago|reply
[+] [-] adgjlsfhk1|5 months ago|reply
[+] [-] purplesyringa|5 months ago|reply
[+] [-] Agingcoder|5 months ago|reply
Compiler bugs are actually quite common ( I used to find several a year in gcc ), but as the author says, some of them only appear when you work at a very large scale, and most people never dive that far.
[+] [-] pfdietz|5 months ago|reply
[+] [-] syncsynchalt|5 months ago|reply
When you're doing enough transactions you start to see a noise floor of e.g. bit flips from cosmic rays, and looking for issues involves correlating/categorizing possible software failures and distinguishing them from the misbehavior of hardware.
[+] [-] javierhonduco|5 months ago|reply
[+] [-] maguro_01|5 months ago|reply
[+] [-] Bengalilol|5 months ago|reply
I was just puzzled by the middle part of the article, where they start investigating their code but seem to overlook the fact that it only happens on ARM64.
Still, I understand that it’s professional to proceed step by step logically.
Great article, it was a pleasure reading it!
[+] [-] bradley13|5 months ago|reply
Even Java, as widespread as it is, I have made half-a-dozen reports. None in the last several years, though.
Better testing? The sheer scale of software being produced?
[+] [-] lou1306|5 months ago|reply
But definitely, better engineering and QA practices must also help here.
[1] https://en.wikipedia.org/wiki/Linus%27s_law
[+] [-] wat10000|5 months ago|reply
[+] [-] mperham|5 months ago|reply
[+] [-] mperham|5 months ago|reply
[+] [-] Sesse__|5 months ago|reply
[+] [-] drob518|5 months ago|reply
[+] [-] syncsynchalt|5 months ago|reply
Their repro case required a stack adjustment larger than 1<<12 (4kiB).
[+] [-] yalok|5 months ago|reply
Used to have a lot of fun with those 3 decades ago.