top | item 42470772

(no title)

dinartem | 1 year ago

We still used the lwarx/stwcx pair to implement atomic operations, but to avoid the hardware bug a strict rule needed to be followed.

Rule: On a given hardware thread (there are two hardware threads per processor on the Xbox 360), every lwarx reservation of an address must be paired with a stwcx conditional store to that same address before a reservation is made to a different address. So a sequence like lwarx A / lwarx B / stwcx B / stwcx A is forbidden. But lwarx A / stwcx A / lwarx B / stwcx B is fine.

So I changed the compiler to emit atomic intrinsics that obeyed this rule.

But there was still the issue of logical thread scheduling. Imagine there are two logical threads running, one has a sequence of lwarx A / stwcx A and the other has lwarx B / stwcx B. The first thread is running on a hardware thread and just after executing lwarx A, the timer interrupt fires and the kernel decides to switch to the second logical thread, which executes lwarx B, and thus violates the rule.

To make sure that never happens, the compiler also emits disable-interrupts / lwarx A / stwcx A / enable-interrupts. That prevents the scheduler from switching threads in the middle of the atomic sequence.

But there was still one more problem. It is possible for a page-fault to occur in the middle of the sequence should it span the end of one page and the beginning of another, and the second page is not in the TLB. So the thread is running along and executes disable-interrupts / lwarx A, then when trying to fetch the next instruction it faults to the hypervisor because it isn't yet mapped by the TLB. The hypervisor executes a bunch of code to add the mapping of the new page to the TLB and then returns to the faulting thread to complete the stwcx A / enable-interrupts sequence.

The problem is that the TLB is a shared resource between the two hardware threads of a processor, so the two hardware threads need a way to atomically update the TLB, and the obvious way to do that is to use a spin-lock that is naturally implemented by a lwarx B / stwcx B pair of instructions. But the hypervisor TLB handler can't use those instructions because the code causing the TLB fault might be in the middle of using them and thus would cause the hardware bug to manifest.

The solution was to use non-reservation load/store instructions to implement a simple spin-lock. If the two hardware threads were both trying to update the TLB then hardware thread 2 would simply wait for hardware thread 1 to clear its lock before proceeding.

discuss

markus_zhang|1 year ago

Thanks so much for the input! I vaguely know a little bit about everything you talked about--the threads, TLB and such, but I have never worked with them in practice. This is so interesting.