top | item 47472566

Looking at Unity made me understand the point of C++ coroutines

178 points| ingve | 4 days ago |mropert.github.io | reply

154 comments

order
[+] Joker_vD|22 hours ago|reply
Simon Tatham, author of Putty, has quite a detailed blog post [0] on using the C++20's coroutine system. And yep, it's a lot to do on your own, C++26 really ought to give us some pre-built templates/patterns/scaffolds.

[0] https://web.archive.org/web/20260105235513/https://www.chiar...

[+] zozbot234|16 hours ago|reply
People love to complain about Rust async-await being too complicated, but somehow C++ manages to be even worse. C++ never disappoints!
[+] nananana9|22 hours ago|reply
You can roll stackful coroutines in C++ (or C) with 50-ish lines of Assembly. It's a matter of saving a few registers and switching the stack pointer, minicoro [1] is a pretty good C library that does it. I like this model a lot more than C++20 coroutines:

1. C++20 coros are stackless, in the general case every async "function call" heap allocates.

2. If you do your own stackful coroutines, every function can suspend/resume, you don't have to deal with colored functions.

3. (opinion) C++20 coros are very tasteless and "C++-design-commitee pilled". They're very hard to understand, implement, require the STL, they're very heavy in debug builds and you'll end up with template hell to do something as simple as Promise.all

[1] https://github.com/edubart/minicoro

[+] pjc50|22 hours ago|reply
> You can roll stackful coroutines in C++ (or C) with 50-ish lines of Assembly

I'm not normally keen to "well actually" people with the C standard, but .. if you're writing in assembly, you're not writing in C. And the obvious consequence is that it stops being portable. Minicoro only supports three architectures. Granted, those are the three most popular ones, but other architectures exist.

(just double checked and it doesn't do Windows/ARM, for example. Not that I'm expecting Microsoft to ship full conformance for C++23 any time soon, but they have at least some of it)

[+] Joker_vD|21 hours ago|reply
Hmm. I'm fairly certain that most of that assembly code for saving/restoring registers can be replaced with setjmp/longjmp, and only control transfer itself would require actual assembly. But maybe not.

That's the problem with register machines, I guess. Interestingly enough, BCPL, its main implementation being a p-code interpreter of sorts, has pretty trivially supported coroutines in its "standard" library since the late seventies — as you say, all you need to save is the current stack pointer and the code pointer.

[+] Sharlin|20 hours ago|reply
C++ destructors and exception safety will likely wreak havoc with any "simple" assembly/longjmp-based solution, unless severely constraining what types you can use within the coroutines.
[+] TuxSH|15 hours ago|reply
> every async "function call" heap allocates.

> require the STL

That it has to heap-allocate if non-inlined is a misconception. This is only the default behavior.

One can define:

void *operator new(size_t sz, Foo &foo)

in the coro's promise type, and this:

- removes the implicitly-defined operator new

- forces the coro's signature to be CoroType f(Foo &foo), and forwards arguments to the "operator new" one defined

Therefore, it's pretty trivial to support coroutines even when heap cannot be used, especially in the non-recursive case.

Yes, green threads ("stackful coroutines") are more straightforward to use, however:

- they can't be arbitrarily destroyed when suspended (this would require stack unwinding support and/or active support from the green thread runtime)

- they are very ABI dependent. Among the "few registers" one has to save FPU registers. Which, in the case of older Arm architectures, and codegen options similar to -mgeneral-regs-only (for code that runs "below" userspace). Said FPU registers also take a lot of space in the stack frame, too

Really, stackless coros are just FSM generators (which is obvious if one looks at disasm)

[+] loeg|10 hours ago|reply
Stackful makes for cute demos, but you need huge per-thread stacks if you actually end up calling into Linux libc, which tends to assume typical OS thread stack sizes (8MB). (I don't disagree that some of the other tradeoffs are nice, and I have no love for C++20 coroutines myself.)
[+] Trung0246|9 hours ago|reply
Actually you don't even need ASM at all. Just need to have smart use of compiler built-in to make it truly portable. See my composable continuation implementation: https://godbolt.org/z/zf8Kj33nY
[+] socalgal2|16 hours ago|reply
As an x-gamedev, suspect/resume/stackful coroutines made them too heavy to have several thousand of them running during a game loop for our game. At the time we used GameMonkey Script: https://github.com/publicrepo/gmscript

That was over 20 years ago. No idea what the current hotness is.

[+] nottorp|9 hours ago|reply
> turns it into some sort of ugly state machine

Why are people afraid of state machines? There's been sooo much effort spent on hiding them from the programmer...

[+] matheusmoreira|8 hours ago|reply
They're essentially callable, stateful, structured gotos. Difficult to understand for the uninitiated.

For example, generators. Also known as semicoroutines.

https://langdev.stackexchange.com/a/834

This:

  generator fib() {
      a, b = 1, 2
      while (a<100) {
          b, a = a, a+b
          yield a
      }
      yield a-1
  }
Becomes this:

  struct fibState {
      a,
      b,
      position
  }

  int fib(fibState state) {
      switch (fibState.postion) {
          case 0:
              fibState.a, fibState.b = 1,2
              while (a<100) {
                  fibState.b, fibState.a = fibState.a, fibState.a+fibState.b
                  // switching the context
                  fibState.position = 1;
                  return fibState.a;
          case 1:
              }

              fibState.position = 2;
              return fibState.a-1
          case 2:
              fibState.position = -1;
      }
  }
The ugly state machine example presented in the article is also a manual implementation of a generator. It's as palatable to the normal programmer as raw compiler output. Being written in C++ makes it even uglier and more complicated.

The programming language I made is a concrete example of what programming these things manually is like. I had to write every primitive as a state machine just like the one above.

https://www.matheusmoreira.com/articles/delimited-continuati...

[+] BSTRhino|10 hours ago|reply
This is one reason why I built coroutines into my game programming language Easel (https://easel.games). I think they let you keep the flow of the code matching the flow of the your logic (top-to-bottom), rather than jumping around, and so I think they are a great tool for high-level programming. The main thing is stopping the coroutines when the entity dies, and in Easel that is done by implying ownership from the context they are created in. It is quite a cool way of coding I think, avoids the state machines like the OP stated, keeps everything straightforward step-by-step and so all the code feels more natural in my opinion. In Easel they are called behaviors if anyone is interested in more detail: https://easel.games/docs/learn/language/behaviors
[+] cherryteastain|22 hours ago|reply
Not an expert in game development, but I'd say the issue with C++ coroutines (and 'colored' async functions in general) is that the whole call stack must be written to support that. From a practical perspective, that must in turn be backed by a multithreaded event loop to be useful, which is very difficult to write performantly and correctly. Hence, most people end up using coroutines with something like boost::asio, but you can do that only if your repo allows a 'kitchen sink' library like Boost in the first place.
[+] abcde666777|22 hours ago|reply
More broadly the dimension of time is always a problem in gamedev, where you're partially inching everything forward each frame and having to keep it all coherent across them.

It can easily and often does lead to messy rube goldberg machines.

There was a game AI talk a while back, I forget the name unfortunately, but as I recall the guy was pointing out this friction and suggesting additions we could make at the programming language level to better support that kind of time spanning logic.

[+] manoDev|20 hours ago|reply
This is more evident in games/simulations but the same problem arises more or less in any software: batch jobs and DAGs, distributed systems and transactions, etc.

This what Rich Hickey (Clojure author) has termed “place oriented programming”, when the focus is mutating memory addresses and having to synchronize everything, but failing to model time as a first class concept.

I’m not aware of any general purpose programming language that successfully models time explicitly, Verilog might be the closest to that.

[+] syncurrent|20 hours ago|reply
This timing additions to a language is also at the core of imperative synchronous programming languages like Este rel, Céu or Blech.
[+] repelsteeltje|22 hours ago|reply
> There was a game AI talk a while back, I forget the name unfortunately, but as I recall the guy was pointing out this friction and suggesting additions we could make at the programming language level to better support that kind of time spanning logic.

Sounds interesting. If it's not too much of an effort, could you dig up a reference?

[+] twoodfin|21 hours ago|reply
As the author lays out, the thing that made coroutines click for me was the isomorphism with state machine-driven control flow.

That’s similar to most of what makes C++ tick: There’s no deep magic, it’s “just” type-checked syntactic sugar for code patterns you could already implement in C.

(Occurs to me that the exceptions to this … like exceptions, overloads, and context-dependent lookup … are where C++ has struggled to manage its own complexity.)

[+] HarHarVeryFunny|21 hours ago|reply
If you need to implement an async state machine, couldn't that just as easily be done with std::future? How do coroutines make this cleaner/better?
[+] appstorelottery|9 hours ago|reply
I've been doing a lot of work with ECS/Dots recently and once I wrapped my head around it - amazing.

I recall working on a few VR projects - where it's imperative that you keep that framerate solid or risk making the user physically sick - this is where really began using coroutines for instantiating large volumes of objects and so on (and avoiding framerate stutter).

ECS/Dots & the burst compiler makes all of this unnecessary and the performance is nothing short of incredible.

[+] wiseowise|19 hours ago|reply
Looking at C++ made me understand the point of Rust.
[+] pjc50|22 hours ago|reply
Always jarring to see how Unity is stuck on an ancient version of C#. The use of IEnumerable as a "generator" mechanic is quite a good hack though.
[+] tyleo|22 hours ago|reply
Unity is currently on C# 9 and that IEnumerable trick is no longer needed in new codebases. async is properly supported.
[+] Deukhoofd|21 hours ago|reply
Thankfully they are actively working towards upgrading, Unity 6.8 (they're currently on 6.4) is supposed to move fully towards CoreCLR, and removing Mono. We'll then finally be able to move to C# 14 (from C# 9, which came out in 2020), as well as use newer .NET functionality.

https://discussions.unity.com/t/coreclr-scripting-and-ecs-st...

[+] Philip-J-Fry|21 hours ago|reply
>The use of IEnumerable as a "generator" mechanic is quite a good hack though.

Is that a hack? Is that not just exactly what IEnumerable and IEnumerator were built to do?

[+] debugnik|22 hours ago|reply
Not that ancient, they just haven't bothered to update their coroutine mechanism to async/await. The Stride engine does it with their own scheduler, for example.

Edit: Nevermind, they eventually bothered.

[+] ahoka|21 hours ago|reply
IIRC generators and co-routines are equivalent in a sense that you can implement one with the other.
[+] repelsteeltje|22 hours ago|reply
Not too different from C++'s iterator interface for generators, I guess.
[+] bullen|20 hours ago|reply
Coroutines generally imply some sort of magic to me.

I would just go straight to tbb and concurrent_unordered_map!

The challenge of parallelism does not come from how to make things parallel, but how you share memory:

How you avoid cache misses, make sure threads don't trample each other and design the higher level abstraction so that all layers can benefit from the performance without suffering turnaround problems.

My challenge right now is how do I make the JVM fast on native memory:

1) Rewrite my own JVM. 2) Use the buffer and offset structure Oracle still has but has deprecated and is encouraging people to not use.

We need Java/C# (already has it but is terrible to write native/VM code for?) with bottlenecks at native performance and one way or the other somebody is going to have to write it?

[+] pjmlp|20 hours ago|reply
As I mentioned on the Reddit thread,

This is quite understandable when you know the history behind how C++ coroutines came to be.

They were initially proposed by Microsoft, based on a C++/CX extension, that was inspired by .NET async/await implementation, as the WinRT runtime was designed to only support asynchronous code.

Thus if one knows how the .NET compiler and runtime magic works, including custom awaitable types, there will be some common bridges to how C++ co-routines ended up looking like.

[+] mgaunard|21 hours ago|reply
Coroutines is just a way to write continuations in an imperative style and with more overhead.

I never understood the value. Just use lambdas/callbacks.

[+] usrnm|20 hours ago|reply
> Just use lambdas/callbacks

"Just" is doing a lot of work there. I've use callback-based async frameworks in C++ in the past, and it turns into pure hell very fast. Async programming is, basically, state machines all the way down, and doing it explicitly is not nice. And trying to debug the damn thing is a miserable experience

[+] affenape|20 hours ago|reply
Not necessarily. A coroutine encapsulates the entire state machine, which might pe a PITA to implement otherwise. Say, if I have a stateful network connection, that requires initialization and periodic encryption secret renewal, a coroutine implementation would be much slimmer than that of a state machine with explicit states.
[+] spacechild1|19 hours ago|reply
> Just use lambdas/callbacks.

Lol, no thanks. People are using coroutines exactly to avoid callback hell. I have rewritten my own C++ ASIO networking code from callback to coroutines (asio::awaitable) and the difference is night and day!

[+] socalgal2|16 hours ago|reply
I'll take the bait. Here's a coroutine

    waitFrames(5); // wait 5 frames
    fireProjectile();
    waitFrames(15);
    turnLeft(-30/*deg*/, 120); // turn left over 120 frames
    waitFrames(10);
    fireProjectile();
    // spin and shoot
    for (i of range(0, 360, 60)) {
      turnRight(60, 90);  // turn 60 degrees over 90 frames
      fireProjectile();
    }
10 lines and I get behavior over time. What would your non-coroutine solution look like?
[+] jayd16|19 hours ago|reply
You can structure coroutines with a context so the runtime has an idea when it can drop them or cancel them. Really nice if you have things like game objects with their own lifecycles.

For simple callback hell, not so much.

[+] Sharlin|20 hours ago|reply
Did you read the article? As the author says, it becomes a state machine hell very quickly beyond very simple examples.
[+] duped|17 hours ago|reply
The value is fewer indirect function calls heap allocations (so less overhead than callbacks) and well defined tasks that you can select/join/cancel.
[+] sagebird|16 hours ago|reply
>> To misquote Kennedy, “we chose to focus coroutines on generator in C++23, not because it is hard, but because it is easy”.

Appreciate this humor -- absurd, tasteful.

[+] Animats|9 hours ago|reply
Most game engines seem to have some coroutine kludge.
[+] djmips|7 hours ago|reply
The 'primitive' SCUMM language used for writing Adventure Games like Maniac Mansion had coroutines - an ill fated attempt to convert to using Python was hampered by Python (at the time) having no support for yield.
[+] troad|6 hours ago|reply
I did not know that, that's neat. Are there any blog posts or articles that go deeper into this?
[+] nice_byte|12 hours ago|reply
I don't know, I'm not convinced with this argument.

The "ugly" version with the switch seems much preferable to me. It's simple, works, has way less moving parts and does not require complex machinery to be built into the language. I'm open to being convinced otherwise but as it stands I'm not seeing any horrible problems with it.