(no title)
MertsA | 2 years ago
That's not a technical error, they mean SRAM in the CPU itself. You're right about gcj but that's kind of a moot point when investigating some reproducible CPU bug like this. The paper mentions all the acrobatics they went through when trying to find the root cause but if gcj would have been practical then it also would have been immediately clear if the gcj output reproduced the error or not. If it didn't reproduce, no big deal, try another approach. You might be right about it being easier to root cause with gdb directly but I'm not so sure. Starting out, you have no idea which instructions under what state are triggering the issue so you'd be looking for a needle in a haystack. A crashdump or gdb doesn't let you bisect that haystack so good luck finding your needle.
SomeoneFromCA|2 years ago
It all depends how good you are with x64 assembly. If you are good enough, you can easily deduce what the instructions at the location do, and can potentially simply copy-paste into an asm file, compile it and check result. Would be much faster to me.
Bluntly speaking, people who are not familiar with low-level debugging make an honest and succesful attempt to investigate a low-level issue. A seasoned kernel developer or reverse engineer would have just used gdb straight away.
MertsA|2 years ago
I think you should take another look at the author list. Chris Mason counts as a seasoned kernel developer in my book. Either way I think you're missing the point. Yes gcj would be different, but there's a decent chance it could hand you a binary that reproduces the issue that you can bisect to the root cause from there. It's one thing to run it through gcj and see if it reproduces, rewriting it in C is a ton of work compared to gcj for something that might not pan out.