Is async/await a good idea for an OS kernel, even a toy one? Cooperative multitasking tends to break down at scale, because the probability that all the "threads" you're cooperating with are playing nice goes to zero as the number of threads increases. An OS will tend to have a concentrated number of the pathological cases in it as it deals with hardware and all the other hardest timing & concurrency problems.
It's a viable option for user-space programs because you can far more tightly characterize them and program them from top to bottom to work with that paradigm nice. Embedding islands of cooperative multitasking in a sea of pre-emptive multitasking seems to make a lot more sense than the other way around.
However, this post is a question, not a statement. If, for example, a Linux kernel developer posts "nah, it's no biggie, the Linux kernel is effectively structured the same", for instance, I would not quibble.
Async/await is a programming paradigm that moves the state management of an asynchronous program into the compiler rather than being explicitly managed by the user. The paradigm is almost universally a win: if you don't want to write asynchronous code, just don't use it; it doesn't impact your code at all.
A second question is how the language actually drives the execution of asynchronous tasks. Most of the languages that have added async/await have incorporated it into their execution model, so you are forced to use the particular semantics of the language. There are subtle, but very important, differences here: for example, when you actually provide the answer that resolves the await, do you execute the waiting code immediately, or schedule it for a future iteration? This is what can cause the most problems with async/await, but Rust does not define any executors [1], so it's not an issue here.
The final question is if asynchronous code at all makes sense for an OS kernel. I'd argue that the answer is very much yes: communication with devices are almost invariably asynchronous (you set some memory lines to tell the device to do something, and get an interrupt telling you it's arrived). Particularly for the filesystem, where there are both synchronous and asynchronous system calls for the same code, a good executor for asynchronous code could well be beneficial for simplifying code paths. (Note that filesystems also generally have some degree of task dependency requirements--you need to ensure that the various requests happen in a particular partial order).
Now do note that this requires a far more complex executor model than most async/await toolkits (including this tutorial) provide.
[1] Logically, it would make sense to provide a utility function that synchronously runs an async function, but this isn't implemented yet.
> Is async/await a good idea for an OS kernel, even a toy one? Cooperative multitasking tends to break down at scale,
> Embedding islands of cooperative multitasking in a sea of pre-emptive multitasking seems to make a lot more sense than the other way around.
In my reading of the post, the proposal is to do cooperative multitasking within the kernel, and preemptive multitasking between the kernel and the userspace. I think this is tenable, within the kernel, you pretty much need to play nice with the rest of the kernel anyway. Most kernels don't have effective protection for one part of the kernel causing problems with the rest; although, some do have limited protection against drivers that crash or loop.
Their findings sound pretty compelling to me. Personally I'm convinced that eventually we'll see much more system-level use of async/await style mechanisms. Perhaps with Rust, perhaps with C++ once coroutines land in C++20.
An OS has many aspects to it. Scheduling tasks is only one of the aspects. Async/await is just the interface/mechanism in dealing with the tasks, in this case cooperative tasks.
Interacting with hardware, dealing with interrupts, memory mapping/managing, task isolation, etc are all other aspects of an OS that are apart from task scheduling but still needed.
Cooperative multitasking works fine as long as the users/developers understand the limitation.
All I/O is asynchronous. Or rather, synchronous I/O is a subset of asynchronous. Often, in kernel I/O systems, you are communicating with, say, a disk controller that will take some time to process your request. You will want to do other stuff while the device finishes and process the results only when the device signals success or failure.
> Embedding islands of cooperative multitasking in a sea of pre-emptive multitasking seems to make a lot more sense than the other way around.
Pretty much my opinion. I think a preemptive threaded environment where some threads might run multiple async tasks (in order to boost efficiency) can make a lot of sense.
Pure cooperative environments seem too error-prone for bigger systems, because they lack the necessary isolation between tasks. Any bad task can degenerate the performance of all others tasks by blocking the execution thread.
Coopertively scheduling across your other isolation boundaries is a pain point, but it's a great architecture for within a single context (ie. within the kernel, or within a single process).
Linux has a lot of cruft and isn't as async as some would like, but NT is pretty async underneath it all (although they still allow you to do syncronous work as well).
I'm not a Linux kernel developer, but it does have a few similar mechanisms AFAIU. If a driver needs to perform more work in response to an interrupt than would fit into the actual handler, it can use cooperatively scheduled "tasklets" or "softirqs."
Probably not at all related to the OP. But I found the thought entertaining, so offering it as such.
One way to address you concern would be to guarantee that task always terminate in a bounded amount of instructions.
One way to solve that would be offer a limited buffer in which the program instructions for the task might be expressed, and a language to express them in that is strongly normalizing (think something like Morte)
>Cooperative multitasking tends to break down at scale, because the probability that all the "threads" you're cooperating with are playing nice goes to zero as the number of threads increases.
That's completely wrong. Either cooperative multitasking works or it doesn't. There is no probability involved. What often happens is that foreign code that is not under your control is not yielding. It only takes a single non cooperating task to block an entire core. If you have properly written code the probability of a malfunction is 0% and this probability doesn't grow with the number of threads.
you can have a kernel thread pool dedicated to one kernel feature. This way only task created by this kernel feature will run on this pool and compete together .
This is also just a fantastic introduction to async/await in Rust, regardless of the OS bits. Another amazing article by Phil.
> The only requirement is that we use at least nightly 2020-03-25 of Rust because async/await was not no_std compatible before.
There's some fun stuff here that's omitted (which makes sense, of course). It was always a design constraint of async/await in Rust that you could use it without the standard library. However, the initial implementation required thread local storage. The future trait's poll method took a &mut Context, and generators didn't support passing context in to them when resuming. This meant that the context would be placed in TLS, and would pull it back out when needed. Generators are unstable, partially for this reason. However, this was fixed, and that means TLS is no longer a requirement here! See https://github.com/rust-lang/rust/pull/68524 for more details.
Is it possible to context switch between userspace processes directly, without going through the kernel, i.e. a kind of fast, inter-process cooperative multitasking? I know earlier operating systems used inter-process cooperative multitasking, but I'm guessing they still went through the scheduler?
I was trying to figure out if QNX does this with `MsgSend`, as QNX is renowned for being a fast microkernel, but it wasn't clear to me. According to Animats, "There's no scheduling delay; the control transfer happens immediately, almost like a coroutine. There's no CPU switch, so the data that's being sent is in the cache the service process will need." [1] According to wikipedia, "If the receiving process is waiting for the message, control of the CPU is transferred at the same time, without a pass through the CPU scheduler." [2]
It seems like `MsgSend` circumvents the scheduler, but does it circumvent context switching to the kernel entirely too?
It's not possible without specific hardware support.
Context-switching on most current platforms with virtual memory requires fiddling with things like page mappings, and that simply must be done from supervisor mode.
The ability to switch tasks in hardware has existed on some historical machines. For example the GEC 4000 series basically implemented a microkernel, including the scheduler, in hardware. Switching to another process was done with a single hardware instruction.
I don't think anything current has such features besides the
long-obsolete and unused taskswitching feature on x86 processors.
No, it doesn't circumvent context switching to the kernel. With memory protection and separate address spaces, you generally need privileged execution to change page tables. However, the raw CPU cost of a syscall exception to the kernel followed by an exception return to userspace isn't actually all that high compared to all of the work that an OS usually does on top of it.
MsgSend() in QNX Neutrino does not circumvent the kernel, just the scheduler. If you're clever and lucky you can get an entire round trip to and from the service in a single timeslice. On the other hand, MsgSend() always blocks, so the call could be preempted depending on service priority or if there is an I/O wait or whatever, and that would mean the scheduler comes to play.
I've shied away from async/await because I haven't seen a good writeup on how to make it deterministic. Come to think of it, all of the times I've encountered it, there was no way to analyze the codepaths and prove that every exceptional situation and failure mode was handled. Maybe I missed something along the way?
So my feeling about it is that it may turn out to be an evolutionary dead end, or anti-pattern at the least. My gut feeling is that async/await is functionally equivalent to the the synchronous/blocking/coroutine/channel system of other languages like Go. Could we write a thin wrapper that converts async/await to that or vice versa?
This is the primary reason why I've stuck to synchronous/blocking PHP with all of its flaws instead of Node.js. I think this is a fundamental thing that shouldn't be glossed over and accepted so readily into other languages.
> Come to think of it, all of the times I've encountered it, there was no way to analyze the codepaths and prove that every exceptional situation and failure mode was handled.
There is nothing special about async/await with regards to this in Rust, at least if I'm understanding you correctly. Async functions can return Results like any other function for recoverable errors, and can panic like any other function for un-recoverable errors.
> My gut feeling is that async/await is functionally equivalent to the the synchronous/blocking/coroutine/channel system of other languages like Go.
It depends on what level of abstraction you're talking about. For Rust, which cares a lot about details, they're not. I gave a talk comparing and contrasting all this stuff here: https://www.infoq.com/presentations/rust-2019/
There are two important things that are often conflated with async/await: the programming model, and the executor model.
The programming model of async/await is essentially syntactic sugar for a state machine, where each of the states are determined implicitly by the site of the call to await, and the ancillary storage of the state machine is managed implicitly by the compiler. It's basically no different from a generator pattern (and is usually implemented on top of the same infrastructure) [1].
Since the result of calling an async function is a task object that doesn't do anything until you call methods on it (including implicitly via await), most languages integrate these task objects with their event model. This model is completely orthogonal to the feature, and often exists (at least in the library, if not the language) before async/await is added to the language. JS has a built-in event loop that predates async/await by several years, whether programming in client-side browsers or using something like Node.js. Rust is unusual in that it prescribes absolutely nothing for the executor model; all it defines in its standard library is the interface to drive the tasks manually.
I doubt async/await is an evolutionary dead end. It's the third kind of coroutine to hit it big in mainstream languages, the first two being the zero-cost exception handling mechanism (i.e., try/catch) and the generator (i.e. yield). The biggest criticism of zero-cost exceptions essentially amounts to their types not being made explicit, but the design of async/await makes their effects on the type of coroutine quite apparent, so it's not going to run into the same problems.
[1] There is a difference in the pattern, which is that you're probably going to get back different kinds of objects implementing different interfaces, which is a bigger deal if you manually drive the interfaces. Structurally--that is, in terms of the actual machine code running--there is likely to be no difference.
async/await-based code is deterministic until you deliberately introduce the ability for tasks to race. Look at Noether for a language design that explicitly makes those steps. Whereas if you have channels (and don't have some kind of single ownership) then you have nondeterministic races right from the start. So I rather doubt that any such thin wrapper could be formed.
Certainly as a human reader, Future-based code (which async/await is sugar for) is the easiest form of concurrency/parallelism to reason about: yes you have small tasks that might execute in parallel, but all your functions still return values, function composition still works the way you'd expect, there's no way for parallel tasks to silently interfere with each other in some hidden way. The control flow is still what it looks like (unlike with general coroutines). Of course if you can afford to avoid parallelism entirely you probably should, but async/await is the least intrusive/most understandable way to manage it if you do need it.
Channels are just queues and passing methods. Rust style futures are closer to continuations. Both are powerful and can accomplish neat taska, but not ABI compatible or even used the same way.
Async or parallel systems are deterministic when the multiple paths don't cross, and when crossed the crossing points have well defined deterministic steps and interaction.
I am no expert and I wish there were a start from 0 guide with examples not involving external crates but my sense is that you can write your own executor and do whatever you want. (Maybe I'm wrong?)
Aside from the cooperative multitasking discussion, I thoroughly enjoyed how you discussed the rust async/await concepts in detail and learned a lot from it. I find your article providing a lot more value than the rust documentation, which is unfortunate since that makes the learning curve somewhat difficult.
Great to hear that! I also found the official documentation a bit short on background information, so I decided to write my own explanation rather than link to something existing. Given that the async/await implementation is still quite young, I'm sure that the official docs will improve over time.
> Using async/wait, we now have basic support for cooperative multitasking in our kernel. While cooperative multitasking is very efficient, it leads to latency problems when individual tasks keep running for too long and thus prevent other tasks to run. For this reason, it makes sense to also add support for preemptive multitasking to our kernel.
> In the next post, we will introduce threads as the most common form of preemptive multitasking. In addition to resolving the problem of long running tasks, threads will also prepare us for utilizing multiple CPU cores and running untrusted user programs in the future.
It seems the plan here is to use this for internal work, and still give a preemptive multitasking API to userspace.
Didn't most old operating systems use cooperative multitasking? I remember, at least, that classic Mac OS (i.e. pre-OSX) didn't use preemptive multitasking, either.
Anyway, this SO answer[0] explains why early Linux, much like the hobby kernel in this article, used cooperative scheduling inside the kernel, and only preempted user-space stuff.
There is one significant difference: Windows did context-switches in the blocking calls and did not rely on the program code having "the right structure" needed for straight co-routines to work.
The difference between preemptive and cooperative multitasking is not whether you do full context switches, but whether there is a way to do context switch at a point where the process does not expect it (ie. by handling some kind of timer interrupt as scheduling opportunity).
i've been going through http://craftinginterpreters.com/ in rust (in order to learn rust) and it's a fantastic learning experience. the language is straightforward after you understand the basics (smart pointers and generics) and if you have a good ide with completion (CLion). lifetimes/borrow checker aren't as painful as people would have you believe if you use heap allocation (i.e. Box, Rc<RefCell>). now obviously heap allocation isn't optimal but for getting started it enables a smooth progression.
jerf|6 years ago
It's a viable option for user-space programs because you can far more tightly characterize them and program them from top to bottom to work with that paradigm nice. Embedding islands of cooperative multitasking in a sea of pre-emptive multitasking seems to make a lot more sense than the other way around.
However, this post is a question, not a statement. If, for example, a Linux kernel developer posts "nah, it's no biggie, the Linux kernel is effectively structured the same", for instance, I would not quibble.
jcranmer|6 years ago
Async/await is a programming paradigm that moves the state management of an asynchronous program into the compiler rather than being explicitly managed by the user. The paradigm is almost universally a win: if you don't want to write asynchronous code, just don't use it; it doesn't impact your code at all.
A second question is how the language actually drives the execution of asynchronous tasks. Most of the languages that have added async/await have incorporated it into their execution model, so you are forced to use the particular semantics of the language. There are subtle, but very important, differences here: for example, when you actually provide the answer that resolves the await, do you execute the waiting code immediately, or schedule it for a future iteration? This is what can cause the most problems with async/await, but Rust does not define any executors [1], so it's not an issue here.
The final question is if asynchronous code at all makes sense for an OS kernel. I'd argue that the answer is very much yes: communication with devices are almost invariably asynchronous (you set some memory lines to tell the device to do something, and get an interrupt telling you it's arrived). Particularly for the filesystem, where there are both synchronous and asynchronous system calls for the same code, a good executor for asynchronous code could well be beneficial for simplifying code paths. (Note that filesystems also generally have some degree of task dependency requirements--you need to ensure that the various requests happen in a particular partial order).
Now do note that this requires a far more complex executor model than most async/await toolkits (including this tutorial) provide.
[1] Logically, it would make sense to provide a utility function that synchronously runs an async function, but this isn't implemented yet.
toast0|6 years ago
> Embedding islands of cooperative multitasking in a sea of pre-emptive multitasking seems to make a lot more sense than the other way around.
In my reading of the post, the proposal is to do cooperative multitasking within the kernel, and preemptive multitasking between the kernel and the userspace. I think this is tenable, within the kernel, you pretty much need to play nice with the rest of the kernel anyway. Most kernels don't have effective protection for one part of the kernel causing problems with the rest; although, some do have limited protection against drivers that crash or loop.
RossBencina|6 years ago
You might want to look at Joe Duffy's posts about Microsoft's Midori OS. In particular the post "Asynchronous Everything":
http://joeduffyblog.com/2015/11/03/blogging-about-midori/
Their findings sound pretty compelling to me. Personally I'm convinced that eventually we'll see much more system-level use of async/await style mechanisms. Perhaps with Rust, perhaps with C++ once coroutines land in C++20.
ww520|6 years ago
Interacting with hardware, dealing with interrupts, memory mapping/managing, task isolation, etc are all other aspects of an OS that are apart from task scheduling but still needed.
Cooperative multitasking works fine as long as the users/developers understand the limitation.
bitwize|6 years ago
Matthias247|6 years ago
Pretty much my opinion. I think a preemptive threaded environment where some threads might run multiple async tasks (in order to boost efficiency) can make a lot of sense.
Pure cooperative environments seem too error-prone for bigger systems, because they lack the necessary isolation between tasks. Any bad task can degenerate the performance of all others tasks by blocking the execution thread.
briangold|6 years ago
monocasa|6 years ago
Linux has a lot of cruft and isn't as async as some would like, but NT is pretty async underneath it all (although they still allow you to do syncronous work as well).
Rusky|6 years ago
CuriousSkeptic|6 years ago
One way to address you concern would be to guarantee that task always terminate in a bounded amount of instructions.
One way to solve that would be offer a limited buffer in which the program instructions for the task might be expressed, and a language to express them in that is strongly normalizing (think something like Morte)
unknown|6 years ago
[deleted]
pjmlp|6 years ago
Symbian also had it, both called them active objects.
imtringued|6 years ago
That's completely wrong. Either cooperative multitasking works or it doesn't. There is no probability involved. What often happens is that foreign code that is not under your control is not yielding. It only takes a single non cooperating task to block an entire core. If you have properly written code the probability of a malfunction is 0% and this probability doesn't grow with the number of threads.
skyde|6 years ago
steveklabnik|6 years ago
> The only requirement is that we use at least nightly 2020-03-25 of Rust because async/await was not no_std compatible before.
There's some fun stuff here that's omitted (which makes sense, of course). It was always a design constraint of async/await in Rust that you could use it without the standard library. However, the initial implementation required thread local storage. The future trait's poll method took a &mut Context, and generators didn't support passing context in to them when resuming. This meant that the context would be placed in TLS, and would pull it back out when needed. Generators are unstable, partially for this reason. However, this was fixed, and that means TLS is no longer a requirement here! See https://github.com/rust-lang/rust/pull/68524 for more details.
m0th87|6 years ago
I was trying to figure out if QNX does this with `MsgSend`, as QNX is renowned for being a fast microkernel, but it wasn't clear to me. According to Animats, "There's no scheduling delay; the control transfer happens immediately, almost like a coroutine. There's no CPU switch, so the data that's being sent is in the cache the service process will need." [1] According to wikipedia, "If the receiving process is waiting for the message, control of the CPU is transferred at the same time, without a pass through the CPU scheduler." [2]
It seems like `MsgSend` circumvents the scheduler, but does it circumvent context switching to the kernel entirely too?
1: https://news.ycombinator.com/item?id=9872640
2: https://en.wikipedia.org/wiki/QNX#Technology
retrac|6 years ago
Context-switching on most current platforms with virtual memory requires fiddling with things like page mappings, and that simply must be done from supervisor mode.
The ability to switch tasks in hardware has existed on some historical machines. For example the GEC 4000 series basically implemented a microkernel, including the scheduler, in hardware. Switching to another process was done with a single hardware instruction.
I don't think anything current has such features besides the long-obsolete and unused taskswitching feature on x86 processors.
cwzwarich|6 years ago
bregma|6 years ago
yencabulator|6 years ago
zackmorris|6 years ago
So my feeling about it is that it may turn out to be an evolutionary dead end, or anti-pattern at the least. My gut feeling is that async/await is functionally equivalent to the the synchronous/blocking/coroutine/channel system of other languages like Go. Could we write a thin wrapper that converts async/await to that or vice versa?
This is the primary reason why I've stuck to synchronous/blocking PHP with all of its flaws instead of Node.js. I think this is a fundamental thing that shouldn't be glossed over and accepted so readily into other languages.
steveklabnik|6 years ago
> Come to think of it, all of the times I've encountered it, there was no way to analyze the codepaths and prove that every exceptional situation and failure mode was handled.
There is nothing special about async/await with regards to this in Rust, at least if I'm understanding you correctly. Async functions can return Results like any other function for recoverable errors, and can panic like any other function for un-recoverable errors.
> My gut feeling is that async/await is functionally equivalent to the the synchronous/blocking/coroutine/channel system of other languages like Go.
It depends on what level of abstraction you're talking about. For Rust, which cares a lot about details, they're not. I gave a talk comparing and contrasting all this stuff here: https://www.infoq.com/presentations/rust-2019/
jcranmer|6 years ago
The programming model of async/await is essentially syntactic sugar for a state machine, where each of the states are determined implicitly by the site of the call to await, and the ancillary storage of the state machine is managed implicitly by the compiler. It's basically no different from a generator pattern (and is usually implemented on top of the same infrastructure) [1].
Since the result of calling an async function is a task object that doesn't do anything until you call methods on it (including implicitly via await), most languages integrate these task objects with their event model. This model is completely orthogonal to the feature, and often exists (at least in the library, if not the language) before async/await is added to the language. JS has a built-in event loop that predates async/await by several years, whether programming in client-side browsers or using something like Node.js. Rust is unusual in that it prescribes absolutely nothing for the executor model; all it defines in its standard library is the interface to drive the tasks manually.
I doubt async/await is an evolutionary dead end. It's the third kind of coroutine to hit it big in mainstream languages, the first two being the zero-cost exception handling mechanism (i.e., try/catch) and the generator (i.e. yield). The biggest criticism of zero-cost exceptions essentially amounts to their types not being made explicit, but the design of async/await makes their effects on the type of coroutine quite apparent, so it's not going to run into the same problems.
[1] There is a difference in the pattern, which is that you're probably going to get back different kinds of objects implementing different interfaces, which is a bigger deal if you manually drive the interfaces. Structurally--that is, in terms of the actual machine code running--there is likely to be no difference.
lmm|6 years ago
Certainly as a human reader, Future-based code (which async/await is sugar for) is the easiest form of concurrency/parallelism to reason about: yes you have small tasks that might execute in parallel, but all your functions still return values, function composition still works the way you'd expect, there's no way for parallel tasks to silently interfere with each other in some hidden way. The control flow is still what it looks like (unlike with general coroutines). Of course if you can afford to avoid parallelism entirely you probably should, but async/await is the least intrusive/most understandable way to manage it if you do need it.
eximius|6 years ago
Channels are just queues and passing methods. Rust style futures are closer to continuations. Both are powerful and can accomplish neat taska, but not ABI compatible or even used the same way.
ww520|6 years ago
ccktlmazeltov|6 years ago
animalnewbie|6 years ago
fbhdev|6 years ago
phil-opp|6 years ago
amelius|6 years ago
What's old is new again ...
steveklabnik|6 years ago
> Using async/wait, we now have basic support for cooperative multitasking in our kernel. While cooperative multitasking is very efficient, it leads to latency problems when individual tasks keep running for too long and thus prevent other tasks to run. For this reason, it makes sense to also add support for preemptive multitasking to our kernel.
> In the next post, we will introduce threads as the most common form of preemptive multitasking. In addition to resolving the problem of long running tasks, threads will also prepare us for utilizing multiple CPU cores and running untrusted user programs in the future.
It seems the plan here is to use this for internal work, and still give a preemptive multitasking API to userspace.
hope-striker|6 years ago
Anyway, this SO answer[0] explains why early Linux, much like the hobby kernel in this article, used cooperative scheduling inside the kernel, and only preempted user-space stuff.
[0]: https://stackoverflow.com/a/16766562
dfox|6 years ago
The difference between preemptive and cooperative multitasking is not whether you do full context switches, but whether there is a way to do context switch at a point where the process does not expect it (ie. by handling some kind of timer interrupt as scheduling opportunity).
AnIdiotOnTheNet|6 years ago
imtringued|6 years ago
fortran77|6 years ago
throwlaplace|6 years ago
i've been going through http://craftinginterpreters.com/ in rust (in order to learn rust) and it's a fantastic learning experience. the language is straightforward after you understand the basics (smart pointers and generics) and if you have a good ide with completion (CLion). lifetimes/borrow checker aren't as painful as people would have you believe if you use heap allocation (i.e. Box, Rc<RefCell>). now obviously heap allocation isn't optimal but for getting started it enables a smooth progression.