top | item 40651532

(no title)

xoranth | 1 year ago

> That allows things like individual threads to take locks, which is a pretty big leap.

Does anyone know how those get translated into SIMD instructions. Like, how do you do a CAS loop for each lane where each lane can individually succeed or fail? What happens if the lanes point to the same location?

discuss

order

raphlinus|1 year ago

There's a bit more information at [1], but I think the details are not public. The hardware is tracking a separate program counter (and call stack) for each thread. So in the CAS example, one thread wins and continues making progress, while the other threads loop.

There seems to some more detail in a Bachelors thesis by Phillip Grote[2], with lots of measurements of different synchronization primitives, but it doesn't go too deep into the hardware.

[1]: https://arxiv.org/abs/2205.11659

[2]: https://www.clemenslutz.com/pdfs/bsc_thesis_phillip_grote.pd...