On mutex performance and WTF::Lock

[+] saagarjha|9 years ago|reply

For those wondering, WTF stands for Web Template Framework: https://webkit.org/blog/6161/locking-in-webkit/

[+] cperciva|9 years ago|reply

My favourite mutex name is the "sex lock" -- also known as the shared/exclusive or sx lock.

[+] 0xFFC|9 years ago|reply

Like i am gonna believe that ;)

(I am kidding)

[+] pizlonator|9 years ago|reply

WTF locks now use a policy we call eventual fairness. You can barge but the amount of time anyone loses due to the barging race is bounded. https://trac.webkit.org/changeset/203350/webkit

[+] obstinate|9 years ago|reply

Hard fairness guarantees for locking are very expensive and only situationally needful, so it's hard to justify paying a high cost to have them always on. It seems like the best thing to do is to use the fastest locking primitive you can find, and implement fairness on top of that in the rare case that you need it.

[+] ajross|9 years ago|reply

So the takeaway here is apparently that POSIX mutexes on macOS/iOS are really slow due to an oddball design decision (fairness), and the browser vendors are racing to see who can reimplment them best. Windows and Linux do this just fine AFAICT.

Seems like it would be more effective for the WebKit folks to just work on their Darwin compatriots to fix the C library locks, no?

[+] rbehrends|9 years ago|reply

It's not as simple as that. What makes locks expensive (no matter their policy regarding fairness) is blocking. Blocking leads to context switches, which are extremely expensive. Linux not using fairness does not eliminate that problem, it mitigates it (which is a good thing, just not a panacea).

I have a simple benchmark that loses performance the more you parallelize it beyond 4-6 cores, until eventually it becomes slower than the single-threaded case [1]. And contention on that is not actually that high. That's still better than the macOS case, where performance goes to hell for even two threads, of course.

Here's the output of time for 32 threads (running on a 48-core system):

  ./a.out -p 32  7.72s user 75.04s system 2603% cpu 3.179 total

as opposed to the sequential case:

  ./a.out -p 0  1.94s user 0.00s system 99% cpu 1.942 total

As you can see, the program spends the vast majority of the time in the kernel due to blocking constantly. (Note that other factors add to the overhead, too.)

In short, you do not want a high degree of contention, anyway. The macOS locks just make contention hurt much more.

[1] https://gist.github.com/rbehrends/27a1ca27ddec7016cd42428b9c...

[+] Jweb_Guru|9 years ago|reply

Fairness isn't an oddball design decision, it's an important assumption that many realtime systems rely on.

[+] quinnftw|9 years ago|reply

I'd be interested in seeing the rational behind the fairshare policy. I suppose theoretically the firstfit policy could result in starvation via one thread constantly preempting another during the lock contention, but I don't imagine that would happen very often in practise.

[+] alain94040|9 years ago|reply

Starvation is a serious problem. Imagine two cores both trying to get a lock. The lock is held by a cache line in one of the cores. If both cores try to get the lock very often, the core further away may very well never get the lock.

The synthetic benchmark results don't show that because they behave nicely: each thread waits politely in between each attempt to get the lock. Real code doesn't always do that.

[+] hyperpape|9 years ago|reply

He doesn't give a lot of details, but somewhere in the first 20 minutes of this talk (probably 15 minutes in), Cliff Click says that Hotspot had to create fair locks in user-space because kernel provided locks weren't adequate.

https://youtu.be/Hqw57GJSrac?t=341

[+] dom0|9 years ago|reply

Starvation and similar problems have plagued software using locks for a long time. A prominent example which is well documented may be the various problems of the GIL implementation when certain workloads are applied (easiest example is one I/O bound thread and one CPU bound thread, which is also a typical real world scenario - ouch!)

[It should be noted that it is commonly rejected that the GIL is a scheduler, acts like a scheduler or should be a proper scheduler; all of which are untrue. Many locks are used as schedulers as well, which is especially true for a lock that controls execution of any and all code.]

[+] rogerbinns|9 years ago|reply

Do remember that current CPUs can dynamically run cores/hyper-threads at different clock speeds at the same time. Not to mention significantly mismatched performance ones (eg ARM big.LITTLE).

If you didn't have fairness, and the developer hadn't carefully assigned threads to cpus based on performance / priority of the work the thread is doing (and locked the CPU clock speeds), then the higher performing ones will starve out the others.

Fairness at least ensures that the developers writing simple straightforward code would get simple straightforward behaviour.

[+] qb45|9 years ago|reply

It's probably a very old decision, from times when most desktop computers had one CPU and threads were mostly used to keep GUI responsive while the application was churning data in the background, so there was little interaction between them.

[+] ricardobeat|9 years ago|reply

> I don’t know how firstfit policy locks compare to something like WTF::Lock on my Mac mini

As a layperson, it would have been nice to see at least some simple benchmarks substantiating what is being said, after several paragraphs trying to explain why WTF::Lock is not a good candidate, and given it's in the title of the post.

[+] leksak|9 years ago|reply

> But fairness is not guaranteed for all OS mutexes; in fact, fairness isn’t even guaranteed in the pthreads standard

Is this true? I do not want to have to pay to read the standard

[+] gpderetta|9 years ago|reply

Good thing it is available for free then!

62 comments