I would zoom in on the actual fix: shrinking the size of the critical section.
When you use defer to release the lock, the entirety of your function becomes the critical section. If you add another bit of code that needs the same lock, your program goes boom.
I've hit this same issue a few times when I write a new, naive HTTP handler function that needs concurrent map access, then throw apachebench at it before I commit (or k6 when it's in QA/staging).
All of my deadlocks have come from deferring the release of the lock coupled with calling another bit of code that (acquires a second lock) rather than explicitly releasing the lock as soon as I no longer need its protection.
I may still defer the release of a lock in very small functions but, now when I see it in code, it immediately sets off alarm bells to look a bit closer.
I think deferring unlocks should be more openly described as an anti-pattern. I do appreciate the convenience of defer generally speaking but it has two problems that specifically hinder effective use of mutexes:
1. it allows you to roll up additional code that doesn't need to be inside the mutex, thus keeping other threads waiting longer for an unlock
2. Benchmarks have shown[1][2] that using defer is actually more expensive than not using defer and simply having the unlock sit just before your return call(s). I get defer produces more readable and safer code, however when you have a locking function, your are concerned with unlocking quickly. So defer directly causes complications here that, in my opinion, outweighs any benefits you get with regards to readability.
Sadly a lot of mutex examples specifically use defer without addressing any of the risks of doing so. such as GoByExample.com[3]
Exactly. Mutexes should be held for the shortest time possible (and you should avoid calling any function you don't know or the implementation details), yet RW mutexes only make sense if the critical section is large enough. Ergo RW mutexes very rarely make sense.
I’ll say that using a worker pool is, IMO, a bit tricker than it sounds. The whole point of making goroutine creation cheap in the first place is that you can just spawn a ton of them, and then use blocking calls anywhere you want. If you have a pool of goroutines, then you open yourself to the condition that all the goroutines in the pool block, waiting for code to execute that can’t schedule because there are no available goroutines.
Any time you want to limit concurrency (e.g. by using a pool), my experience is that you have to think hard about the correct way to do that. This despite the fact that the underlying idea of a worker pool seems very innocent and easy to use. Read a few papers on the topic and you’ll see a variety of approaches. For example, if work blocks, do you give up the slot to other work? No easy answer!
I don't think using a worker pool is so bad, and I usually use one in preference to spawning lots of goroutines. I tend to follow this pattern https://play.golang.org/p/KtCPYaDiVFl, in which synchronization between the workers and the master relies on the language's built-in features and does not use explicit mutexes.
> For example, if work blocks, do you give up the slot to other work? No easy answer!
That's true in languages where threads are expensive, but goroutines are so cheap that you can probably avoid the problem simply by running a very large pool of workers.
People talk about the lack of generics et al as being Go's downfall but personally what I've found to be the biggest annoyance is the very thing Go boasts to be good at: writing bug/race free multi-threaded code.
Channels are vastly oversold given their actual suitability and worse still, add their own classes of race conditions (deadlocks) so you often fallback to using mutexes for those, hopefully rare, occasions you need to pass a shared stated. And once you're reliant on mutexes then you're really no better off than you were with any other language where threading is an afterthought.
The only real benefit Go brings is the ease of spinning new goroutines, but that can be a double edged sword if you don't understand how to manage them in the first place, and the green thread model making goroutines lightweight. Though even this isn't exactly unique to Go either.
I feel like this is one of Go's biggest weaknesses yet it seldom seems to get the same discussion generics and error handling (both of which are annoyances in specific edge cases but neither of which cause the same level of show stopper bugs).
Yes, I fully agree. The basics and primitives are very simple and easy, but actually using those to make useful packages is far harder than it might seem at first glance.
Goroutines and channels are not bad at all, but I feel there's some part missing to use them effectively without too much headaches. I've often found sync.WaitGroup and mutexes far more useful than channels.
I concur. Although I perfectly understand why they were chosen, Go's concurrent synchronization primitives are not a good design choice. The developers of Go wanted to have primitives out of which better abstractions can be built while leaving freedom to the programmer, it makes sense and is consistent with Go's philosophy. In my experience, however, there is more buggy concurrent code out there than code that works correctly, and this could have been avoided by some higher level abstraction like the actor model or Ada's tasking with Rendezvous (where the latter is not so different from channels, to be fair).
It's a real problem because of 3rd party libraries. Too many 3rd party libraries produce deadlocks in unusual scenarios, and it can take almost as much time to write the library on your own than to find out whether their use of goroutines is correct. Deadlocks are really everywhere. The same with sync.atomic by the way, the use of these primitives is rarely correct and people keep using them despite the warnings in the docs. I'm not claiming to be smarter, on the contrary, the problem really is that Go's choice of concurrency primitives is only suitable for concurrency experts and most programmers (including myself) aren't.
Generics could for instance have helped implement a mutex that disallows access to the memory that needs to be synchronized without locking it first. This of course wouldn't have helped here, as the root cause was holding on to the mutex for too long.
I was under the impression that Go had first party tooling for deadlock checking, something called threadanalyzer if I recall correctly.
The write priority of RW locks is a classical pitfall. Even knowing about this, I ended up falling into the exact same bug described in the article at least a couple of times.
If anyone is wondering why RW mutexes block new readers if there is a waiting writer, the reason is to prevent writer starvation: if, say, two threads continuously acquire and release read locks, there may never be a point in time in which no thread is holding a lock, and the writer is stuck waiting forever. Some mutexes can be configured to use read priority, but that is something that should only be done if it can be proven that starvation cannot happen.
This is a reminder that read locks do not absolve from being careful about what happens in the critical section; it should be made as small as possible no matter whether the lock is exclusive or shared.
The goroutine-inspect tool they used looks really useful.
Does anyone know if there's a similar tool for Rust? I've had a few times in the past where I'm trying to figure out why some async application using tokio is hanging and having a similar tool would be very helpful.
I recently researched this after speccing out which language to choose for some data heavy async apps at work - And the answer is right now, unfortunately, no.
Rust is in a bit of a weird place here since as a language it only provides async primitives, and it relies on frameworks bringing their own async runtimes - So the ball is in the Tokio teams' court to provide.
It is nice to use channels and tasks in Tokio - The stalwart Rust compiler catches pitfalls with memory leaks and deadlocks that you could fall into with Go. However, observing what's happening within the Tokio runtime is a far cry from where Go and its profiling tools are at.
https://github.com/tokio-rs/console - here's an attempt to make a TUI where you can watch everything in motion, which is about as close as you can get to an easy UI into the system.
There is Loom[1] (part of the Tokio project) for exhaustively testing multithreaded code. Though as far as I can tell it is designed for debugging threads, not async tasks.
One way I often debug this kind of problem is to use go's built-in pprof tooling. You can insert a profiling server into your program in four lines: https://pkg.go.dev/net/http/pprof. Then, do something like
go tool pprof 'http://127.0.0.1:6060/debug/pprof/goroutine?debug=1'
to get a goroutine profile of your running program, and use the web command to display it in a browser as a DAG. The stuck goroutine can usually be seen pretty easily on the graph.
[+] [-] oogali|4 years ago|reply
When you use defer to release the lock, the entirety of your function becomes the critical section. If you add another bit of code that needs the same lock, your program goes boom.
I've hit this same issue a few times when I write a new, naive HTTP handler function that needs concurrent map access, then throw apachebench at it before I commit (or k6 when it's in QA/staging).
All of my deadlocks have come from deferring the release of the lock coupled with calling another bit of code that (acquires a second lock) rather than explicitly releasing the lock as soon as I no longer need its protection.
I may still defer the release of a lock in very small functions but, now when I see it in code, it immediately sets off alarm bells to look a bit closer.
[+] [-] hnlmorg|4 years ago|reply
1. it allows you to roll up additional code that doesn't need to be inside the mutex, thus keeping other threads waiting longer for an unlock
2. Benchmarks have shown[1][2] that using defer is actually more expensive than not using defer and simply having the unlock sit just before your return call(s). I get defer produces more readable and safer code, however when you have a locking function, your are concerned with unlocking quickly. So defer directly causes complications here that, in my opinion, outweighs any benefits you get with regards to readability.
Sadly a lot of mutex examples specifically use defer without addressing any of the risks of doing so. such as GoByExample.com[3]
[1] https://gist.github.com/janisz/ce19a7fa94cbc99e2835a4421ccfc...
[2] https://medium.com/i0exception/runtime-overhead-of-using-def...
[3] https://gobyexample.com/mutexes
[+] [-] gpderetta|4 years ago|reply
[+] [-] klodolph|4 years ago|reply
Any time you want to limit concurrency (e.g. by using a pool), my experience is that you have to think hard about the correct way to do that. This despite the fact that the underlying idea of a worker pool seems very innocent and easy to use. Read a few papers on the topic and you’ll see a variety of approaches. For example, if work blocks, do you give up the slot to other work? No easy answer!
[+] [-] gnfargbl|4 years ago|reply
> For example, if work blocks, do you give up the slot to other work? No easy answer!
That's true in languages where threads are expensive, but goroutines are so cheap that you can probably avoid the problem simply by running a very large pool of workers.
[+] [-] hnlmorg|4 years ago|reply
Channels are vastly oversold given their actual suitability and worse still, add their own classes of race conditions (deadlocks) so you often fallback to using mutexes for those, hopefully rare, occasions you need to pass a shared stated. And once you're reliant on mutexes then you're really no better off than you were with any other language where threading is an afterthought.
The only real benefit Go brings is the ease of spinning new goroutines, but that can be a double edged sword if you don't understand how to manage them in the first place, and the green thread model making goroutines lightweight. Though even this isn't exactly unique to Go either.
I feel like this is one of Go's biggest weaknesses yet it seldom seems to get the same discussion generics and error handling (both of which are annoyances in specific edge cases but neither of which cause the same level of show stopper bugs).
[+] [-] arp242|4 years ago|reply
Goroutines and channels are not bad at all, but I feel there's some part missing to use them effectively without too much headaches. I've often found sync.WaitGroup and mutexes far more useful than channels.
Actually, I wrote an entire article about this, discussed here: https://news.ycombinator.com/item?id=26220693
[+] [-] jonathanstrange|4 years ago|reply
It's a real problem because of 3rd party libraries. Too many 3rd party libraries produce deadlocks in unusual scenarios, and it can take almost as much time to write the library on your own than to find out whether their use of goroutines is correct. Deadlocks are really everywhere. The same with sync.atomic by the way, the use of these primitives is rarely correct and people keep using them despite the warnings in the docs. I'm not claiming to be smarter, on the contrary, the problem really is that Go's choice of concurrency primitives is only suitable for concurrency experts and most programmers (including myself) aren't.
[+] [-] eptcyka|4 years ago|reply
I was under the impression that Go had first party tooling for deadlock checking, something called threadanalyzer if I recall correctly.
[+] [-] mikevm|4 years ago|reply
[+] [-] qaq|4 years ago|reply
[+] [-] ot|4 years ago|reply
If anyone is wondering why RW mutexes block new readers if there is a waiting writer, the reason is to prevent writer starvation: if, say, two threads continuously acquire and release read locks, there may never be a point in time in which no thread is holding a lock, and the writer is stuck waiting forever. Some mutexes can be configured to use read priority, but that is something that should only be done if it can be proven that starvation cannot happen.
This is a reminder that read locks do not absolve from being careful about what happens in the critical section; it should be made as small as possible no matter whether the lock is exclusive or shared.
[+] [-] masklinn|4 years ago|reply
[+] [-] Ameo|4 years ago|reply
Does anyone know if there's a similar tool for Rust? I've had a few times in the past where I'm trying to figure out why some async application using tokio is hanging and having a similar tool would be very helpful.
[+] [-] beckingham|4 years ago|reply
Rust is in a bit of a weird place here since as a language it only provides async primitives, and it relies on frameworks bringing their own async runtimes - So the ball is in the Tokio teams' court to provide.
It is nice to use channels and tasks in Tokio - The stalwart Rust compiler catches pitfalls with memory leaks and deadlocks that you could fall into with Go. However, observing what's happening within the Tokio runtime is a far cry from where Go and its profiling tools are at.
https://github.com/tokio-rs/console - here's an attempt to make a TUI where you can watch everything in motion, which is about as close as you can get to an easy UI into the system.
[+] [-] pornel|4 years ago|reply
[+] [-] paavohtl|4 years ago|reply
[1] https://github.com/tokio-rs/loom
[+] [-] gnfargbl|4 years ago|reply
[+] [-] underatree|4 years ago|reply