The big problem I have with locks is under high use, they slow parallel code down MORE than if you ran it serially. This is because of "false sharing" when one core grabs the lock, it clears the cache line of every other core that will grab the lock, and they have to fetch from the main memory to read the lock. Fetching from main memory is very slow. I use compare-and-swap in those cases, and almost every lock I've used I've converted to compare-and-swap. I'd like to explore "restartable sequences" for parallel data structures (https://tinyurl.com/vzh3523y) but AFAIK this is only available on linux and we develop on MacOS and Linux.
zznzz|3 years ago
Compare and swap still typically requires the cache line to bounce around between cores, so if that’s the primary cost of locks for you it seems like compare and swap doesn’t really fix the problem of frequent synchronization between cores.
drmeister|3 years ago
sapiogram|3 years ago
hinkley|3 years ago
charleslmunger|3 years ago