Fiber in C++: Understanding the Basics

[+] hackyhacky|2 years ago|reply

I'm confused in how this article defines "fiber." At the beginning of the article, fibers are described as a lightweight userspace thread, similar to what is provided in the runtimes of Haskell and Go, and has been recently revived in Java as "virtual threads."

However, later in the article (after a long and apparently irrelevant digression about stack management), they describe Windows API functions that are required for using fibers and seem to suggest that fibers are OS-level, and not application-managed.

Am I missing something?

[edit: typo]

[+] nickelpro|2 years ago|reply

The Windows fiber library is just that, a library. It's not an OS component. It's provided because stack switching in Windows is a somewhat fraught [1] and not officially documented procedure which requires modifying the TIB [2].

Doing it correctly isn't hard, and has been done many times in bullet-proof libs (notably boost::context), but MS would maybe rather you not.

[1]: https://devblogs.microsoft.com/oldnewthing/20080215-00/?p=23...

[2]: https://en.wikipedia.org/wiki/Win32_Thread_Information_Block

[+] flohofwoe|2 years ago|reply

"Fibers", "green threads", "stack switching", "cooperative multitasking" are essentially all the same thing, they all rely on being able to switch execution context by switching to a different stack within the same OS thread. As such they can be implemented either in (CPU/OS-specific) user code or in "OS APIs" (like the Windows Fiber functions).

Only downside of the technique is that it cannot be implemented in WASM (and maybe some other esoteric runtime environments), because WASM has separate data- and call-stacks and the call stack is not accessible from within the WASM virtual machine (while 'async-await' which relies on code transformation done by the compiler can be implemented in WASM just fine).

There is a 'stack-switching proposal' for WASM though, but I don't know what's the state of that:

https://github.com/WebAssembly/stack-switching

[+] bicarbonato|2 years ago|reply

Setting aside certain implementation details, here's the basics:

Kernel Threads (aka threads): OS-Level and preemptive.

Green Threads (aka lightweight threads, virtual threads or goroutines): User-level and preemptive.

Fibers: User-level (sometimes OS-level or offered as a OS library, like in Windows) and cooperative.

Coroutines: User-level and cooperative.

[+] pvg|2 years ago|reply

Windows has its own fibers facility which does some things for you but still leaves the scheduling to the application so it's still fibers. It's a little confusing in the article but they look at the Windows API as a kind of API template - what do you need to implement to have fibers. Then they describe their own implementation.

[+] ojkelly|2 years ago|reply

My general understanding is that a fiber is a lightweight thread, and conceptually can be implemented inside any program that wants to schedule its own fibers.

They’re distinct from CPU threads.

When I last dived into fibers I discovered windows was trying to make them a thing quite a while ago which is why we have things like [0].

Fibers being virtual threads, are recursive. In the same way you can run a VM inside a VM, you could build fibers on top of fibers (which sounds like a recipe for unpredictable performance).

[0] https://learn.microsoft.com/en-us/windows/win32/procthread/u...

[+] Diggsey|2 years ago|reply

Userspace != Application-managed

On Windows, many things exist in user-space that are managed by the OS. This is in contrast to linux where the Linux kernel is its own thing with its own stable API.

Fibers are an OS-level concept that exists in user-space.

[+] j16sdiz|2 years ago|reply

Fiber is a concept in Windows API for user-space thread.

It was designed for MS SQL Server. It have a very confusing API and did not gain much use in 3rd party applications.

[+] PaulHoule|2 years ago|reply

Personally if I had a choice of just one I’d pick the threads over fibers if the threads facility is really high performance like the one in Java.

Today you have quad cores even on a cheap phone and 16 on a desktop and 60+ on a server and OS and some runtimes make it pretty easy to get large speed ups on many tasks even when subtasks are closely coordinated. (One thing I like about Java is that it has low-level thread primitives that really scale as opposed to many systems have have a small set of low-level primitives that in theory let you do everything but not scale.)

Contrast that to fibers and similar things (JS and Python async) that do a great job of keeping a CPU busy when you are waiting for network activity but are limited to one CPU.

[+] ylow|2 years ago|reply

Some random historical notes coroutine stuff which are perhaps of interest to some:

1. You can implement "stack-free coroutines" in C with some really entertaining macros https://www.chiark.greenend.org.uk/~sgtatham/coroutines.html

2. Does anyone remembers Cilk? Its still the fastest stackful coroutine models I know of.

Having implemented and used both before (for different purposes), I like both, but with a mild preference for the stackful version. Largely because you don't have function coloring problems, and you can actually get a proper stack trace and is so significantly easier to debug when things go wrong.

It is also good to differentiate coroutines from async (they seem to be very interleaved these days). Coroutines are a mechanic to achieve mutual recursion / generators. And that is fine for expressing certain algorithms and systems in a much cleaner fashion. Note that parallelism is not necessarily implied by "coroutine".

Async is a mechanic to "doing something else" while waiting for IO. Fibers or state machines are both different solutions to this problem. Certain coroutine implementations can help with this, but I think it is massively overused. Rust due to the function coloring problem, ends up requiring almost everything to be marked "async". And some golang code I have read seems to overuse goroutines unecessarily. I think use of async should be narrow and minimized, and localized to only the places where it makes sense.

[+] gpderetta|2 years ago|reply

> Does anyone remembers Cilk?

Yes! It's a shame it never gained traction and Intel abandoned it, eventually getting dropped from icc and gcc.

[+] diffuse_l|2 years ago|reply

I believe that fibers are considered more trouble than they worth? [0]

[0] https://devblogs.microsoft.com/oldnewthing/20191011-00/?p=10...

[+] csb6|2 years ago|reply

The Gor Nishanov paper linked in that piece goes into a lot of detail of some of the downsides of fibers in an unmanaged language like C++.

[0] http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2018/p136...

[+] indigoabstract|2 years ago|reply

Since coroutines have been in the standard since C++20, I wish he would have explained better what are the advantages of using a Fiber over a Coroutine.

[+] snaury|2 years ago|reply

Unfortunately in 2023 c++20 coroutines are just barely starting to become usable. Compilers have bugs, especially with symmetric transfer, e.g. msvc doesn't suspend until await_suspend returns (which causes races and double-frees), gcc stops using tail calls in some modes (which can cause stack overflow), I ended up emulating symmetric transfer because you can't rely on it, and emulating it is of course slower. Compilers spill local variables from await_suspend to coroutine frames, which again causes races, which necessitates workarounds like marking await_suspend noinline (in a non-portable way). Having noinline anything in the code path disables heap allocation elision, so all coroutines start to allocate from the heap. When comparing a function call to a coroutine call there's a vast difference in performance. And you need to use coroutines all the way down, which really adds up.

Fibers have their own problems: you need to allocate stacks (and that may be expensive), you can't switch fibers between system threads (compilers cache thread local addresses, unfortunately with x86 linux tls often uses fs: segment addressing, so it may appear to work until it doesn't, and weirdly this problem existed in c++20 coroutines too until recent clang versions), widely used open source implementations (e.g. boost context / boost coroutine / boost fiber) are not even exception safe, because you need to save/restore exception globals when switching fibers, and nobody seems to care. :-/

One huge upside to fibers is that a function call is just a normal function call, and it's fast as a result. I wish c++ taken fibers more seriously and added steps for making them safe to use (portable way to handle global state, portable way to mark switching functions as invalidating tls addresses, etc.), we could have something similar to java virtual threads or go goroutines then.

[+] munch117|2 years ago|reply

Advantage: No function colouring. Just like with threads, you don't need to declare which functions might potentially yield. There's none of that noisy async/await syntax.

Advantage: Fast switching. Switching between fibers is done by swapping in new values for the program counter and stack pointer. Compare with async, where yielding only takes you up one level of stack, so your code ends up walking a stack of coroutine invocations to get all the way back to the scheduler.

Disadvantage: Expensive setup - each fiber needs its own pre-allocated stack.

Disadvantage: Operating systems don't provide I/O systems readymade to work with fiber schedulers, so if you want to yield to a different fiber during blocking I/O, then you need to invent your own system to handle that.

PS about naming: "Fiber" is the Microsoft Windows name for their implementation of full stack coroutines, but the proper platform-independent name for it is actually "coroutine". I've kept with the word "fiber" here to avoid misunderstandings.

PS about C++: This is based on async like you find it in C# or Python, I don't really know about C++ coroutines.

[+] samsquire|2 years ago|reply

Thank you for this in-depth article.

I am a less than a C++ beginner but I asked Stack Overflow how to run C++ coroutines in a thread pool. It seems coroutines in C++20 are unfinalised but don't quote me on that but I did get some sourcecode for older versions of the C++20 standard.

I used Marce's Coll's excellent blog post about how to create coroutines in assembly by adjusting the RSP register.

https://blog.dziban.net/posts/coroutines/

I extended Marce's code to run the coroutines in kernel threads:

https://github.com/samsquire/assembly (see threadedcoroutines.S)

I have been thinking of coroutines in terms of query compilation for database engines and the volcano query model and this article:

https://www.chiark.greenend.org.uk/~sgtatham/coroutines.html

Tying together two pieces of code that call eachother in push or pull driven style is really powerful. Or if you're running multiple independent tasks that need their own state. This as I understand it is the original intent of object orientation that Alan Kay wanted and is represented by Erlang and partly Go.

Specifically, I am thinking of compiler created coroutines where code can be interleaved at compile time rather than at runtime.

[+] syncurrent|2 years ago|reply

Somewhat related to Fibres are the Trails of synchronous reactive programming languages. Both allow efficient logical concurrency based on cooperative scheduling. The nice thing with Trails is that the scheduling can be determined by the compiler by extracting dependencies from the synchronous reactive program thus increasing determinism of your app.

[+] PaulDavisThe1st|2 years ago|reply

Remind me again why no OS other than Solaris ever provided "scheduler activations" (i.e. the kernel calls back into a user-space scheduler whenever a kernel-level scheduling event occurs)

[+] surajrmal|2 years ago|reply

Have you seen https://github.com/google/ghost-userspace

[+] kevingadd|2 years ago|reply

Windows had a custom scheduler API (https://learn.microsoft.com/en-us/windows/win32/procthread/u...) but they removed it, presumably because it wasn't used much

[+] endorphine|2 years ago|reply

> Fiber is quite a lightweight thread of execution. Like coroutine, fiber allows yielding at any point inside it. To some degree, we can regard fiber as a form of stackful coroutine [...]

Can someone explain what the "stackful coroutine" term means (ideally with an example)? And how can one implement it?

[+] nickelpro|2 years ago|reply

It's exactly what it says, it's a coroutine with its own stack.

A fiber isn't to some degree a stackful coroutine, it is a stackful coroutine.

A stackless coroutine does not have a its own stack, it uses the calling context's stack and therefore can only yield to the calling context from the top-level coroutine function. A stackful coroutine has its own stack, and so can yield to the calling context from anywhere.

As far as how to implement, I always liked Malte Skarupke's blog post on the subject: https://probablydance.com/2013/02/20/handmade-coroutines-for...

[+] nstbayless|2 years ago|reply

I assume it's this:

In Python, coroutines cannot yield from within another function call: if coroutine A calls function B, B cannot yield. In Lua, it's possible to yield a coroutine at any function depth: B can yield.

To implement something like this in a compiled language, you just need multiple stacks instead of the usual 1. Most architectures, such as x86, have a stack pointer register; when yielding a coroutine, just change the sp register to point to some other stack -- when resuming, restore the sp.

This is not a new concept. Pokémon on the gameboy did this for its UI fiber, for example.

[+] frozenport|2 years ago|reply

    Some readers may have a question by now. What is the rationale behind the choices of the registers that need to be stored?

Yet another article that goes into the gorey details of how fiber works but not how fibers are used.

How do we write the "Applications own scheduler"?

[+] OnlyMortal|2 years ago|reply

I’ve plenty of Classic Mac experience with cooperative threads. You’d basically yield to run another “thread”.

Can someone tell me the difference from those to fibres?

[+] IceMichael|2 years ago|reply

Still looking for a lib that supports fibers... boost::asio is... well, boost, and therefore annoying. It also does not support it out of the box

[+] gpderetta|2 years ago|reply

Boost asio is available as a standalone lib as well. What do you mean it doesn't support "it" out of the box?

[+] dilyevsky|2 years ago|reply

https://github.com/google/marl

[+] yellow_lead|2 years ago|reply

Asio has strands which may be an alternative depending on the use case.

[+] Alifatisk|2 years ago|reply

Why is more and more languages embracing Fibers / Green Threads now? I’m curious

[+] bmacho|2 years ago|reply

No reasons, probably. No, really. If you flip a coin 10 times, 3-7 or 7-3 is a perfectly OK outcome, and our n (the more and more languages) is not even 10.

[+] gpderetta|2 years ago|reply

pushback against the stackless coroutine model mostly (i.e. async/await).

But people have been playing with stackful coroutines in C++ for decades.

[+] smallstepforman|2 years ago|reply

Again, academia and the tech press overlook to talk about Actors, and we live in a world with poorly implemented pretend Actor subsets, without any of the true Actor features (work stealing, message passing etc, and best of all, no contention since no shared resources, other than Actor references).

Fibres, goroutines, promises, futures, job queues, etc, are poor substitutes for Actors.

[+] grumpyprole|2 years ago|reply

> Fibres, goroutines, promises, futures, job queues, etc, are poor substitutes for Actors.

They are not necessarily substitutes, they are lower level primitives that allow more complex models, like Actors, to be implemented.

All execution models have trade offs; and sometimes we might prefer an execution model without deadlock, livelock and starvation. So for a general purpose language, a complex model like Actors is ideally a library and not enforced as the only way of achieving concurrency.

[+] pjmlp|2 years ago|reply

Perfect is the enemy of good.

Just like with FP and LP, I rather have something close enough, than nothing at all.

[+] jeffreygoesto|2 years ago|reply

No shared resources can be a performance problem, I can see why there are rusts and vales efforts to come to a slightly different tradeoff.

[+] unknown|2 years ago|reply

[deleted]

75 comments