top | item 34849054

(no title)

zznzz | 3 years ago

What you describe (multiple cores sharing the same lock) sounds like true sharing, not false sharing. False sharing means cache line contention is when cores are using separate structures (i.e. not actually shared) which happen to be on the same cache line.

Compare and swap still typically requires the cache line to bounce around between cores, so if that’s the primary cost of locks for you it seems like compare and swap doesn’t really fix the problem of frequent synchronization between cores.

discuss

drmeister|3 years ago

Ah - thank you. "True sharing" makes way more sense to describe the situation. The general problem with locks I've seen again and again - and maybe CAS doesn't improve it, but I thought it did. It is challenging to profile this stuff carefully. I've been disappointed many times by my attempts at multithreaded programming when locks are involved. I've implemented a multithreaded Common Lisp programming environment (https://github.com/clasp-developers/clasp.git) that interoperates with C++. Two locks that cause me massive headaches are (1) in the unwinder when we do a lot of stack unwinding in multiple threads and (2) in the memory manager when we do a lot of memory allocation in multiple threads. Parallel code runs multiple times slower than serial code. In another situation where we do parallel non-linear optimization without allocating memory, we can get 20 CPU working in parallel with no problems.