top | item 25206527

(no title)

I didn't really understand the TSO explanation given in this article and found it to be a bit hand-wavy. The article says to emulate the x86 TSO consistency model on an ARM machine which is weakly ordered you have to add a bunch of instructions which would make the emulation slow. I followed that much but then after that it doesn't really explain how they would get around these extra instructions needed to guarantee the ordering. It just says "oh, it's a hardware toggle"; toggle of what exactly?

I could see them just saying no to following TSO for single core stuff and when running emulated code for single core performance benchmarks since technically you don't care about ordering for single core operation/correctness. That would speed up their single core stuff but then what about the multi-core.

discuss

Veedrac|5 years ago

> It just says "oh, it's a hardware toggle"; toggle of what exactly?

A toggle that makes the chip treat all loads and stores from that thread as TSO.

dirtypersian|5 years ago

so you're saying somehow Rosetta2 is looking at an x86 binary and figuring out exactly which portions of the program rely on the TSO ordering for correctness and then dynamically switches to weak ordering for parts that might be able to do without?

I don't really know much about the internals of macOS but figuring out when there are applications for example running on two different cores (since TSO is only really needed for multi-core use cases) that need to access the same memory and then applying TSO on the fly like that seems difficult. If that is what Rosetta2 is actually doing, that is impressive.