top | item 25222243

How io_uring and eBPF Will Revolutionize Programming in Linux

709 points| harporoeder | 5 years ago |scylladb.com | reply

320 comments

order
[+] pierrebai|5 years ago|reply
Ah the funny things we resad about in 2020.

In 1985... yes I said 1985, the Amiga did all I/O through sending and receiving messages. You queued a message to the port of the device / disk you wanted, when the I/O was complete you received a reply on your port.

The same message port system was used to receive UI messages. And filesystems, on top of drive system, were also using port/messages. So did serial devices. Everything.

Simple, asynchronous by nature.

As a matter of fact, it was even more elegant than this. Devices were just DLL with a message port.

[+] beagle3|5 years ago|reply
And it worked, well, with 512K memory in 1985.

The multitasking was co-operative, and there was no paging or memory protection. That didn't work as well (But worked surprisingly well, especially compared to Win3.1 which came 5-6 years later and needed much more memory to be usable).

I suspect if Commodore/Amiga had done a cheaper version and did not suck so badly at planning and management, we would have been much farther along on software and hardware by now. The Amiga had 4 channel 8-bit DMA stereo sound in 1985 (which with some effort could become 13-bit 2 channel DMA stereo sound), a working multitasking system, 12-bit color high resolution graphics, and more. I think the PC had these specs as "standard" only in 1993 or so, and by "standard" I mean "you could assume there was hardware to support them, but your software needed to include specific support for at least two or three different vendors, such as Creative Labs SoundBlaster and Gravis UltraSound for sound).

[+] InafuSabi|5 years ago|reply
A friend of mine was amazed by this capability of the Amiga when I showed him that on one screen I could play mod.DasBoot in NoiseTracker, pull the screen down partly then go on the BBS in the terminal by manually dialing atdt454074 and entering, without my A500 even skipping one beat...

All I had was the 512kB expander, he had a 386 with 387 and could only run a single tasking OS

[+] ww520|5 years ago|reply
I remember NetWare's IPX/SPX network stack used a similar async mechanism. The caller submits a buffer for read and continues to do whatever. When the network card receives the data, it puts them in the caller's buffer. The caller is notified via a callback when the data is ready. All these were fitted in a few K's of memory in a DOS TSR.

All the DOS games at the time used IPX for network play for a reason. TCP was too "big" to fit in memory.

[+] tyingq|5 years ago|reply
"In 1985... yes I said 1985, the Amiga did all I/O through sending and receiving messages"

I do remember that, and it was cool. But, lightweight efficient message passing is pretty easy when all processes share the same unprotected memory space :)

[+] gens|5 years ago|reply
When you want to squeeze every bit of performance out of a system, you want to avoid doing system calls as much as possible. io_uring lets you check if some i/o is done by just checking a piece of memory, instead of using read, pool, or such.
[+] agumonkey|5 years ago|reply
One thing that doesn't change is that every decade people will look at the Amiga and admire it the same no matter how much ~advances have been made since.
[+] Upvoter33|5 years ago|reply
This over-romanticizes Amiga (a beautiful system no doubt) because there have been message-passing OSes since the 1960s (see Brinch Hansen's Nucleus for example). The key difference with io_uring is that is an incredibly efficient and general mechanism for async everything. It really is a wonderful piece of technology and an advance over the long line of "message passing" OSes (which always were too slow).
[+] nonesuchluck|5 years ago|reply
Purely for entertainment, what is the alternate history that might have allowed Amiga to survive and thrive? Here's my stab:

- in the late 80s, Commodore ports AmigaOS to 386

- re-engineers Original Chipset as an ISA card

- OCS combines VGA output and multimedia (no SoundBlaster needed)

- offers AmigaOS to everyone, but it requires their ISA card to run

- runs DOS apps in Virtual 8086 mode, in desktop windows or full-screen

[+] bsder|5 years ago|reply
All this fuss because Linux wouldn't just implement kQueue ... Sigh.
[+] CalChris|5 years ago|reply
This reminds me of David Wheeler's adage:

  All problems in computer science can be solved by another level of indirection.
The rejoinder, and I don't know who gets credit for it, is:

  All performance problems can be solved by removing a layer of indirection.
[+] harry8|5 years ago|reply
Have we stopped solving all performance problems with introducing a cache? Why wasn't I told? Will I have to hand in my union card?
[+] fefe23|5 years ago|reply
I don't think io_uring and ebpf will revolutionize programming on Linux. In fact I hope they don't. The most important aspect of a program is correctness, not speed. Writing asynchronous code is much harder to get right.

Sure, I still write asynchronous code. Mostly to find out if I can. My experience has been that async code is hard to write, is larger, hard to read, hard to verify as correct and may not even be faster for many common use cases.

I also wrote some kernel code, for the same reason. To find out if I could. Most programmers have this drive, I think. They want to push themselves.

And sure, go for it! Just realize that you are experimenting, and you are probably in over your head.

Most of us are most of the time.

Someone will have to be able to fix bugs in your code when you are unavailable. Consider how hard it is to maintain other people's code even if it is just a well-formed, synchronous series of statements. Then consider how much worse it is if that code is asynchronous and maybe has subtle timing bugs, side channels and race conditions.

If I haven't convinced you yet, let me try one last argument.

I invite you to profile how much actual time you spend doing syscalls. Syscalls are amazingly well optimized on Linux. The overhead is practically negligible. You can do hundreds of thousands of syscalls per second, even on old hardware. You can also easily open thousands of threads. Those also scale really well on Linux.

[+] skybrian|5 years ago|reply
I don't know what kind of programming you're doing, but in network apps, if you have a thread per client and lots of clients (like a web server), you end up with lots of threads waiting on responses from slow clients, and that takes up memory. The time blocked on the syscall has nothing to do with your own machine's performance.

But on the other hand, if your server is behind a buffering proxy so it's not streaming directly over the Internet, it might not be a problem.

[+] cheph|5 years ago|reply
Writing asynchronous code is trying to fix how your code is executed in the code itself. It is the wrong solution for a real problem.

But I think what many people get wrong (not the person I'm replying to) is that how you write code and how you execute code does not have to be the same.

This is essentially why google made their N:M threading patches: https://lore.kernel.org/lkml/20200722234538.166697-1-posk@po...

This is why Golang uses goroutines. This is why Javascript made async/await. This is why project loom exists. This is why erlang uses erlang processes.

All of these initiatives make it possible to write synchronous code and execute it as if it was written asynchronously.

And I think all of this also makes it clear that how you write code and how code is executed is not the same, so yes, I'm in agreement with the person I'm replying to, I don't think this will change how code is written that much, because this can't make writing code asynchronously any less of a bad idea than it is now.

[+] junon|5 years ago|reply
What a wonderfully dogmatic comment that completely misses the point of io_uring.
[+] trevyn|5 years ago|reply
What are your thoughts on Rust?
[+] pengaru|5 years ago|reply
Coincidentally last night I announced [0] a little io_uring systemd-journald tool I've been hacking on recently for fun.

No ebpf component at this time, but I do wonder if ebpf could perform journal searches in the kernel side and only send the matches back to userspace.

Another thing this little project brought to my attention is the need for a compatibility layer on pre-io_uring kernels. I asked on io_uring@vger [1] last night, but nobody's responded yet, does anyone here know if there's already such a thing in existence?

[0] https://lists.freedesktop.org/archives/systemd-devel/2020-No...

[1] https://lore.kernel.org/io-uring/20201126043016.3yb5ggpkgvuz...

[+] anarazel|5 years ago|reply
I'd like something roughly similar, to make the rr reverse debugger support io_uring. That likely can't work like most other syscalls, due to the memory only interface...
[+] cycloptic|5 years ago|reply
I was thinking about doing this for an event loop I was working on, but no code to show yet... you probably can get away easily with using pthreads and a sparse memfd to store the buffers.
[+] adzm|5 years ago|reply
This feels very very similar to IO completion ports / iocp on Windows. More modern versions of Windows even has registered buffers for completion which can be even more performant in certain scenarios. I'm looking forward to trying this out on Linux.

I'm curious to see how this might work its way into libuv and c++ ASIO libraries, too.

[+] Matthias247|5 years ago|reply
There's currently a lot of talk about io_uring, but most articles around it and usages still seem more in the exploration, research and toy project state.

I'm however wondering what the actual quality level is, whether people used it successfully in production and whether there is an overview with which kernel level which feature works without any [known] bugs.

When looking at the mailing list at https://lore.kernel.org/io-uring/ it seems like it is still a very fast moving project, with a fair amount bugfixes. Given that, is it realistic to think about using any kernel in with a kernel version between 5.5 and 5.7 in production where any bug would incur an availability impact, or should this still rather be a considered an ongoing implementation effort and revisited at some 5.xy version?

An extensive set of unit-tests would make it a bit easier to gain trust into that everything works reliably and stays working, but unfortunately those are still not a thing in most low-level projects.

[+] mwcampbell|5 years ago|reply
> Things will never be the same again after the dust settles. And yes, I’m talking about Linux.

One has to be in quite a techie bubble to equate Linux kernel features with actual world-changing events, as the author goes on to do.

More on-topic though, having read the rest of the article, my guess is that while these features will let companies squeeze some more efficiency out of high-end servers, they won't change how most of us develop applications.

[+] zests|5 years ago|reply
I am impressed with the level of linux knowledge in this thread. How do people become linux kernel hackers? Most of the developers I know (including myself) use linux but have very little awareness beyond application level programming.
[+] the8472|5 years ago|reply
You don't necessarily have to be a kernel hacker to be familiar with many of the features that the kernel provides. Just doing application debugging often requires to dig deeper until you hit some kernel balrogs.

Container problems? Namespaces, Cgroups, ...

Network problems? Netfilter, tc, lots of sysctl knobs, tcp algorithms (cue 1287947th thread on nagle/delayed acks/cork)

Slow disk IO? Now you need to read up on syscalls and maybe find more efficient uses. Copy_file_range doesn't work as expected? Suddenly you're reading kernel release notes or source code.

[+] marcosdumay|5 years ago|reply
> How do people become linux kernel hackers?

Honestly, by hacking it.

There's a famous book about Linux internals that I don't remember the name (but has "Linux" and "internals" on it). But I have never seen anybody doing it by reading a book (despite how excellent it can be). You just go change what you want or read the submodule you are interested in understanding, and use the book, site or whatever when you have a problem.

[+] 01100011|5 years ago|reply
For the most part, it's just software. If you have the time and the interest, you can learn it like anything else. At some level, it requires an awareness of how the hardware works(page tables/MMUs/IOMMUs, interrupts, SMP, NUMA, etc).

I don't mean to downplay the investment, but if you're already an experienced software engineer you can get into it if it interests you. There is a different mindset among systems software programmers though. Reliability comes first, performance and functionality come second. It's a world away from hacking python scripts that only need to run once to perform their function.

[+] gpanders|5 years ago|reply
I learned a TON about the Linux kernel through writing custom device drivers for FPGAs. Granted most of my experience is in the driver area and not in any of the subsystems, but even still I have a much better grasp of how the kernel operates now (and even more importantly, I know how to navigate it and how to find relevant documentation).
[+] amboar|5 years ago|reply
As others have said, hacking it, certainly. But if you're not up for that and would like something more passive, read LWN.net (and possibly subscribe!)
[+] yobert|5 years ago|reply
I learned a lot by trying to make Go talk to ALSA without using any existing C interfaces. Just happy exploration goes a long ways!
[+] PeterCorless|5 years ago|reply
Today I am grateful for the brilliant minds around the world that continually open up fundamentally revolutionary new ways to develop applications. To Jens, to Alexei, and to Glauber, and to all of their kindred and ilk, we raise a glass!
[+] ganafagol|5 years ago|reply
The title of the HN post is missing a suffix of "for a few niche applications".

My work is "programming in Linux", but it's not impacted by any of this since I'm working in a different area.

I'm sure this is important work, but maybe tone down such claims a bit.

[+] capableweb|5 years ago|reply
"few niche applications" being any application that touches files, network or want to run code in the kernel. Sounds like a bigger target than just "niche", but I'm no Linux developer so what do I know.
[+] grahamm|5 years ago|reply
At SCO in the mid-90s we were playing with very similar ideas to boost DB performance. The main motivation was the same then as it is now, don't block and avoid making system calls into the kernel once up and running. Don't recall if any of the work made it into product.
[+] mhh__|5 years ago|reply
eBPF is still a bit rough but it's already very cool what you can do already.

It would be nice to see it at a high-level at the syscall interface i.e. currently if I want to attach a probe I have to find the function myself or use a library but it would he nice to have it understand elf files.

[+] hawk_|5 years ago|reply
One thing that I haven't been able to get is if this makes things like DPDK or user mode tcp stack unnecessary since the system call overhead is gone.
[+] dathinab|5 years ago|reply
io_uring reduces but doesn't remove the system call overhead.

Only with in kernel polling mode is it close to removed. But kernel polling mode has it's own cost. If the system call overhead is no where close to being a bottle neck, i.e. you don't do system calls "that" much, e.g. because your endpoints take longer to complete then using kernel polling mode can degrade the overall system performance. And potential increase power consumption and as such heat generation.

Besides that user mode tcp stacks can be more tailored for your use case which can increase performance.

So all in all I would say that it depends on your use case. For some it will make user mode tcp useless or at least not worth it but for others it doesn't.

[+] qchris|5 years ago|reply
I'm genuinely curious; both of these changes seem to be exciting due to the ability for people to extend and implement specialized code/features using the kernel. Since the Linux kernel is GPLed (v2, I believe?), does this mean that the number of GPL requests related to products' operating systems is likely to increase, since groups using this extensibility will be writing code covered by the GPL which might actually be of value to other people? Or does the way io_uring and eBPF are implemented isolate the code in such a way that the extensions through their frameworks such that the GPL license won't affect them?
[+] jamesfisher|5 years ago|reply
Who added the two generic Covid paragraphs to the start of this otherwise good article? _Please_ stop.
[+] bauerd|5 years ago|reply
Such an odd thing to open an article about IO tech with …
[+] secondcoming|5 years ago|reply
Phoronix showed that a recent bug fix in io_uring negated most of the gains when they profiled redis
[+] b0rsuk|5 years ago|reply
How does it make Linux compare to Windows, OSX and *BSD?