(no title)
Matthias247 | 4 months ago
I fully agree that this and the cancellation issues discussed before can lead to surprising issues even to seasoned Rust experts. But I’m not sure what really can be improved under the main operating model of async rust (every future can be dropped).
But compared to working with callbacks the amount of surprising things is still rather low :)
mycoliza|4 months ago
On the other hand, it also wasn't our coworker who had written the code where we found the bug who was to blame, either. It wasn't a case of sloppy programming; he had done everything correctly and put the pieces together the way you were supposed to. All the pieces worked as they were supposed to, and his code seemed to be using them correctly, but the interaction of these pieces resulted in a deadlock that it would have been very difficult for him to anticipate.
So, our conclusion was, wow, this just kind of sucks. Not an indictment of async Rust as a whole, but an unfortunate emergent behavior arising from an interaction of individually well-designed pieces. Just something you gotta watch out for, I guess. And that's pretty sad to have to admit.
[1] https://news.ycombinator.com/item?id=45776868
kibwen|4 months ago
But it still suggests that `tokio::select` is too powerful. You don't need to get rid of `tokio::select`, you just need to consider creating a less powerful mechanism that doesn't risk exhibiting this problem. Then you could use that less powerful mechanism in the places where you don't need the full power of `tokio::select`, thereby reducing the possible places where this bug could arise. You don't need to get rid of the fully powerful mechanism, you just need to make it optional.
imtringued|4 months ago
There is no real difference between a deadlock caused by a single thread acquiring the same non reentrant lock twice and a single thread with two virtual threads where the the first thread calls the code of the second thread inside the critical section. They are the same type of deadlock caused by the same fundamental problem.
>Remember too that the Mutex could be buried beneath several layers of function calls in different modules or packages. It could require looking across many layers of the stack at once to be able to see the problem.
That is a fundamental property of mutexes. Whenever you have a critical section, you must be 100% aware of every single line of code inside that critical section.
>There’s no one abstraction, construct, or programming pattern we can point to here and say "never do this". Still, we can provide some guidelines.
The programming pattern you're looking for is guaranteeing forward progress inside critical sections. Only synchronous code is allowed to be executed inside a critical section. The critical section must be as small as possible. It must never be interrupted, ever.
Sounds like a pain in the ass, right? That's right, locks are a pain in the ass.
octoberfranklin|4 months ago
No, just have select!() on a bunch of owned Futures return the futures that weren't selected instead of dropping them. Then you don't lose state. Yes, this is awkward, but it's the only logically coherent way. There is probably some macro voodoo that makes it ergonomic. But even this doesn't fix the root cause because dropping an owned Future isn't guaranteed to cancel it cleanly.
For the real root cause: https://news.ycombinator.com/item?id=45777234
mycoliza|4 months ago
How does that prevent this kind of deadlock? If the owned future has acquired a mutex, and you return that future from the select so that it might be polled again, and the user assigns it to a variable, then the future that has acquired the mutex but has not completed is still not dropped. This is basically the same as polling an `&mut future`, but with more steps.