top | item 40028118

Show HN: Comprehensive inter-process communication (IPC) toolkit in modern C++

88 points| ygoldfeld | 1 year ago |github.com

If you work in C++, and you would like 2+ programs to share data structures (and/or native I/O handles a.k.a. FDs) among each other, there is a good chance Flow-IPC

- will make it considerably less annoying to code than typical approaches; and

- may massively reduce the latency involved.

Those sharing Cap'n Proto-encoded data may have particular interest. Cap'n Proto (https://capnproto.org) is fantastic at its core task - in-place serialization with zero-copy - and we wanted to make the IPC (inter-process communication) involving capnp-serialized messages be zero-copy, end-to-end.

That said, we paid equal attention to other varieties of payload; it's not limited to capnp-encoded messages. For example there is painless (<-- I hope!) zero-copy transmission of arbitrary combinations of STL-compliant native C++ data structures.

To help determine whether Flow-IPC is relevant to you we wrote an intro blog post. It works through an example, summarizes the available features, and has some performance results. https://www.linode.com/blog/open-source/flow-ipc-introductio...

Of course there's nothing wrong with going straight to the GitHub link and getting into the README and docs.

Currently Flow-IPC is for Linux. (macOS/ARM64 and Windows support could follow soon, depending on demand/contributions.)

57 comments

order

rurban|1 year ago

I also went this route and came to the very same conclusions. Cap'n proto for fast reading, SHM for shared data, simple short messaging, just everything in C.

My only problem is MacOS with its too small default SHM buffers, you need to enhance them. Most solutions need a reboot, but a simple setter is enough. Like sudo sysctl -w kern.sysv.shmmax=16777216

ygoldfeld|1 year ago

Interesting! I'd best write this down. Current notes on macOS and Windows port work:

https://github.com/Flow-IPC/ipc/issues/101 (<= https://github.com/orgs/Flow-IPC/discussions/98)

For macOS/ARM64, currently it looks to me like the apparent lack of /dev/shm equivalent (unless I messed up in searching for it) means the most significant amount of new work necessary to port it ... but you just mentioned a thing I did not know about. (SHM size/count limits definitely were a thing in Linux, though, indeed.) TY

abcd_f|1 year ago

There's no reason to use (the very ancient) SHM API over mmap, not today.

You can literally do everything with mmap that you can do with shm, without hitting OS caps, no performance penalty and with a code that's simpler.

pmalynin|1 year ago

Tbh on macOS you should probably use xpc / Mach and that will probably let you do more than a generic ipc library. Of course caveat emptor it’s not portable

jeffreygoesto|1 year ago

Does the schema help a lot? For C++ you can get very fast without, for example with IceOryx https://github.com/eclipse-iceoryx/iceoryx

In contrast to Cap'n'Proto you get compiler optimized struct layout as benefit from using raw structs. Benchmarks are here https://iceoryx.io/v2.0.2/examples/iceperf/

ygoldfeld|1 year ago

I think that (whether a native-struct versus a capnp schema-based struct = helps/how much) is a general question of what kind of serialization is best for a particular use-case. I wouldn't want to litigate that here fully. Personally though I've found capnp-based IPC protocols to be neat and helpful, across versions and protocol changes (where e.g. there are well-defined rules of forward-compatibility; and Flow-IPC gives you niceties including request-response and message-type demultiplexing to a particular handler). [footnote 1 below]

BUT!!! Some algorithms don't require an "IPC protocol" per se, necessarily, but more like = 2+ applications collaborating on a data structure. In that case native structures are for sure superior, or at times even essentially required. (E.g., if you have some custom optimized hash-table -- you're not going to want to express it as a capnp structure probably.)

So, more to the point:

- Flow-IPC 100% supports transmitting/sharing (and constructing, and auto-destroyting) native C++ structures. Compared to iceoryx, on this point, it appears to have some extra capabilities, namely full support for structures with pointers/references and/or STL-compliant containers. (This example https://iceoryx.io/latest/examples/complexdata/ and other pages say things like, "To implement zero-copy data transfer we use a shared memory approach. This requires that every data structure needs to be entirely contained in the shared memory and must not internally use pointers or references. The complete list of restrictions can be found...".) Flow-IPC, in this context, means no need to write custom containers sans heap-use, or eliminate pointers in an existing structure. [footnote 2 below]

- Indeed, the capnp framing (only if you choose to use the Flow-IPC capnp-protocol feature in question!) adds processing and thus some computational and RAM-use overhead. For many applications, the 10s of microseconds added there don't matter much -- as long as they are constant regardless of structure size, and as long as they are 10s of microseconds. So a 100usec (modulo processor model of course!) RTT (size-independent) is pretty good still. Of course I would never claim this overhead doesn't matter to anyone, and iceoryx's results here are straight-up admirable.

[footnote 1] The request/response/demultiplexing/etc. niceties added by Flow-IPC's capnp-protocol feature-in-question work well IMO, but one might prefer the sweet RPC-semantics + promise pipelining of capnp-RPC. Kenton V (capnp inventor/owner) and I have spoken recently about using Flow-IPC to zero-copy-ify capnp-RPC. I'm looking into it! (He suspects it is pretty simple/natural, given that we handle the capnp-serialization layer already, and capnp-RPC is built on that.) This wouldn't change Flow-IPC's existing features but rather exercise another way of using them. In a way Flow-IPC provides a simple-but-effective-out-of-the-box schema-based conversation protocol via capnp-serialization, and capnp-RPC would provide an alternate (to that out-of-the-box guy) conversation protocol option. I tried pretty hard to design Flow-IPC in a grounded and layered way, so such work would be natural as opposed to daunting.

[footnote 2] In fact the Flow-IPC capnp-based structured-channel feature (internally) itself uses Flow-IPC's own native-structure-transmission feature in its implementation (eat our own dog-food). Since a capnp serialization = sequence of buffers (a.k.a. segments), for us it is (internally) represented as essentially an STL list<vector<uint8_t>>. So we construct/build one of those in SHM (internally); then only a small SHM-handle is (internally) transmitted over the IPC-transport [footnote 3]; and the receiver then obtains the in-place list<vector<uint8_t>> (essentially) which is then treated as the capnp-encoding it really is. This would all happen (internally) when executing the quite-short example in the blog (https://www.linode.com/blog/open-source/flow-ipc-introductio...). As you can see there, to the Flow-IPC-using developer, it's just -- like -- "create a message with this schema here, call some mutators, send"; and conversely "receive a message expected to have that (same) schema, OK -- got it; call some accessors."

[footnote 3] IPC-transport = Unix domain socket or one 2 MQ types -- you can choose via template arg (or add your own IPC-transport by implementing a certain pair of simple concepts).

ygoldfeld|1 year ago

Whoa. I’m the lead developer on this - I got to this post totally by accident: was googling for my own Show HN post about this from a couple days ago - and it took me here without my noticing.

There’s some discussion on it in Show HN, and of course I can answer anything here that people might be interested in too. I’m very proud of it and very grateful Akamai gave the resources to open-source it.

I’d like to have a flashier friendlier site with a slick intro video - haven’t had the time to do that stuff - but the substance and API documentation + Manual are very serious and complete, I hope.

All linked off the blog-post!

mgaunard|1 year ago

Serialization is the trivial part; the hard part is building a lockfree mpmc queue or message bus (depending on what you want) on top of fixed-size pre-allocated memory segments.

I can't tell what this library does; the blog articles and readme all talk about stuff that isn't close to any of the challenges that I see.

ygoldfeld|1 year ago

While I wouldn't dream of claiming Flow-IPC will fit every IPCer's priorities, nor of trying to change yours or anyone's, nor of debating about what is trivial versus hard -- it should at least be easily possible to know what's in Flow-IPC. I'm here to help; this is the API overview with various code-snippet synopses, etc.:

https://flow-ipc.github.io/doc/flow-ipc/versions/main/genera...

I should also note that Flow-IPC does not provide "serialization"; it does however enable the use of an existing/best serializer (capnp) for zero-copy messaging. This is only one feature -- albeit oft requested, hence my decision to base the blog/README example on it. (I'm currently also looking into extending this to capnp-RPC.)

But, of course, we don't expect it to match what everyone is looking for; in your case IceOryx might be more your speed -- have a look.

ygoldfeld|1 year ago

It's so hard to communicate this stuff in writing! There are several angles of potential interest; I wish I could simply chat in-person with anyone curious, you know? Of course that is impossible. (I'll do my best here at HN and the Flow-IPC Discussions board at GitHub.)

I hope the above 2 links get the job done in communicating the key points. There is certainly no shortage of documentation! Still:

If you'll indulge me, I do want to share how this project got started and became open-source. I actually do suspect this might help one get a feeling of what this thing is, and is not.

My name is Yuri Goldfeld. I have worked at Akamai since 2005 (with a break for startup shenanigans, and VMware, in the middle). I designed or co-designed Flow-IPC and wrote about 75% of it (by lines of code ignoring comments); my colleague Eddy Chan wrote the rest, including the bulk of the SHM-jemalloc component (which is really cool IMO).

Akamai in certain core parts is a C++/Linux shop, with dogged scrutiny to latency. Every millisecond along the request path is scrutinized. A few years ago I was asked to do a couple things: - Determine the best serializer to use, in general, but especially for IPC protocols. The answer there was easy IMO: Cap'n Proto. - Split-up a certain important C++ service into several parts, for various reasons, without adding latency to the request path.

The latter task meant, among other things, communicating large amounts of user data from server application to server application. capnp-encoded structures (sometimes big - but not necessarily) would also need to be transmitted; as would FDs.

The technical answers to these challenges are not necessarily rocket science. FDs can be transmitted via Unix domain socket as "ancillary data"; the POSIX `sendmsg()` API is hairy but usable. Small messages can be transmitted via Unix domain socket, or pipe, or POSIX MQ (etc.). Large blobs of data it would not be okay to transmit via those transports, as too much copying into and out of kernel buffers is involved and would add major latency, so we'd have to use shared memory (SHM). Certainly a hairy technology... but again, doable. And as for capnp - well - you "just" code a `MessageBuilder` implementation that allocates segments in SHM instead of regular heap like `capnp::MallocMessageBuilder` does.

Thing is, I noticed that various parts of the company had similar needs. I've observed some variation of each of the aforementioned tasks custom-implemented - again, and again, and again. None of these implementations could really be reused anywhere else. Most of them ran into the same problems - none of which is that big a deal on its own, but together (and across projects) it more than adds up. To coders it's annoying. And to the business, it's expensive!

Plus, at least one thing actually proved to be technically quite hard. Sharing (via SHM) a native C++ structure involving STL containers and/or raw pointers: downright tough to achieve in a general way. At least with Boost.interprocess (https://www.boost.org/doc/libs/1_84_0/doc/html/interprocess....) - which is really quite thoughtful - one can accomplish a lot... but even then, there are key limitations, in terms of safety and ease of use/reusability. (I'm being a bit vague here... trying to keep the length under control.)

So, I decided to not just design/code an "IPC thing" for that original key C++ service I was being asked to split... but rather one that could be used as a general toolkit, for any C++ applications. Originally we named it Akamai-IPC, then renamed it Flow-IPC.

As a result of that origin story, Flow-IPC is... hmmm... meat-and-potatoes, pragmatic. It is not a "framework." It does not replace or compete with gRPC. (It can, instead, speed RPC frameworks up by providing the zero-copy transmission substrate.) I hope that it is neither niche nor high-maintenance.

To wit: If you merely want to send some binary-blob messages and/or FDs, it'll do that - and make it easier by letting you set-up a single session between the 2 processes, instead of making you worry about socket names and cleanup. (But, that's optional! If you simply want to set up a Unix domain socket yourself, you can.) If you want to add structured messaging, it supports Cap'n Proto - as noted - and right out of the box it'll be zero-copy end-to-end. That is, it'll do all the SHM stuff without a single `shm_open()` or `mmap()` or `ftruncate()` on your part. And if you want to customize how that all works, those layers and concepts are formally available to you. (No need to modify Flow-IPC yourself: just implement certain concepts and plug them in, at compile-time.)

Lastly, for those who want to work with native C++ data directly in SHM, it'll simplify setup/cleanup considerably compared to what's typical. For the original Akamai service in question, we needed to use SHM as intensively as one typically uses the regular heap. So in particular Boost.interprocess's built-in 2 SHM-allocation algorithms were not sufficient. We needed something more industrial-strength. So we adapted jemalloc (https://jemalloc.net/) to work in SHM, and worked that into Flow-IPC as a standard available feature. (jemalloc powers FreeBSD and big parts of Meta.) So jemalloc's anti-fragmentation algorithms, thread caching - all that stuff - will work for our SHM allocations.

Having accepted this basic plan - develop a reusable IPC library that handled the above oft-repeated needs - Eddy Chan joined and especially heavily contributed on the jemalloc aspects. A couple years later we had it ready for internal Akamai use. All throughout we kept it general - not Akamai-specific (and certainly not specific to that original C++ service that started it all off) - and personally I felt it was a very natural candidate for open-source.

To my delight, once I announced it internally, the immediate reaction from higher-up was, "you should open-source it." Not only that, we were given the resources and goodwill to actually do it. I have learned that it's not easy to make something like this presentable publicly, even having developed it with that in mind. (BTW it is about 69k lines of code, 92k lines of comments, excluding the Manual.)

So, that's what happened. We wrote a thing useful for various teams internally at Akamai - and then Akamai decided we should share it with the world. That's how open-source thrives, we figured.

On a personal level, of course it would be gratifying if others found it useful and/or themselves contributed. What a cool feeling that would be! After working with exemplary open-source stuff like capnp, it'd be amazing to offer even a fraction of that usefulness. But, we don't gain from "market share." It really is just there to be useful. So we hope it is!

robobully|1 year ago

That's an impressive read, thank you and congrats on the release! I think that nowadays the development and adoption of performant IPC mechanisms is unfairly low, it's good to have such tech opensourced.

My question is, how does Flow-IPC compare to projects like Mojo IPC (from Chromium) and Eclipse iceoryx? At first glance they all pursue similar goals and pay much less attention to complex allocation management, yet managing to perform well enough.

OnlyMortal|1 year ago

I’ve spent a lot of time with boost asio and serialisation of objects into a boost variant to send that across the wire. The server vists the variant to process the message. Including boost shared memory for file data.

Both for unix domain sockets and TCP.

There’re plenty of boost examples around so, I’d suggest, you take their examples and work them for your framework.

As I’m sure you’re aware, a clean and easy to read example will make a difference.

It’s great that you’re open source and I hope you get some traction.

signa11|1 year ago

i have done something exactly identical at my current place of employment, and am always inquisitive to see how others have 'stacked-da-cat'.

we _unfortunately_ gravitated towards protobuf's despite my fervent appeal to go with capn-proto. that has caused a cascade of troubles / missed opportunities for optimizations etc. etc.

fwsgonzo|1 year ago

I tried to migrate to capn-proto but it just doesn't build on MinGW so I have no choice but to wait. Like you say, it gets worse the more I wait. But, if the APIs are somewhat sane, they should hopefully also be somewhat similar: Able to switch case on oneofs, movable data structures etc.

I don't like that protobuf has recently started linking with abseil, which despite being a good framework, I can't use it if it doesn't build absolutely everywhere I need it to. So, maybe I'll be forced over to CapnProto one of these days?

sgtnoodle|1 year ago

I've also developed a strikingly similar low latency real-time IPC message bus for work. It also uses sockets with transparent shared memory optimization. In my case, it's the backbone for an autonomous aircraft's avionics. I made everything agnostic to the message scheme, though, and most of the tooling supports an in-house schema, protobuf, JSON, YAML, etc. There's also clients implemented in C++, Rust, Python and Julia.

What troubles has protobuf caused you?

ygoldfeld|1 year ago

At the risk of almost-spamming -- this post has taken off, which is sweet, and I have noticed some trends about what seems to interest readers which may not be placed as prominently up-top as would fit this audience. To wit: the API Overview from the Manual covers the available stuff, with code snippets (and some words). Could save people some time:

https://flow-ipc.github.io/doc/flow-ipc/versions/main/genera...

forrestthewoods|1 year ago

> Currently Flow-IPC is for Linux

Dang. I was excited for a brief moment, but support for macOS + Windows is mandatory for all of my use cases.

To be honest what I actually want is NOT "the fastest possible thing". All I actually care about is "easy advertisement, discovery, and message sending". I use localhost TCP way more than I want because it "just works".

Maybe someday I'll stumble across my dream IPC library.

ygoldfeld|1 year ago

Oooh, so close. We’ve got the advertisement/discovery and messaging for sure.

Concretely what it would take to port it to those OS: https://github.com/Flow-IPC/ipc/issues/101

Given a couple weeks to work on it, this thing would be on macOS no problem. With Windows I personally need to understand its FD-passing and native handle concepts first, but I’m guessing it’d be a similar amount of effort in the end.

ygoldfeld|1 year ago

(Akamai owns Linode and uses the blog on Linode.com as a developer-oriented blog. So that’s why the link is to there.)

nonane|1 year ago

Cool stuff!

Does Flow-IPC protect against malformed messages? For example a client sending malformed messages to a server process

sgtnoodle|1 year ago

Given that it's shared memory based, it seems like there has to be some degree of trust that the participants are well behaved. What do you mean by a malformed message, though? If you're talking about the payload of the message, that seems like a matter of the message scheme you're using. If you're talking about correctness of the IPC protocol itself, integrity checking is unfortunately at odds with latency.

seego|1 year ago

Do you have any concrete plans about a potential network extension yet?

ygoldfeld|1 year ago

A couple -

1. The obvious one is “just” extending stuff internally working via Unix domain sockets to TCP sockets. Various internal code is written with an eye to that, including anticipating that certain operations (such as connect) that are instant locally can would-block in a network.

If people enjoy the API, this would be a no-brainer value-add, even if lots of people would scoff and use actual dedicated networking techniques (HTTP, whatever) directly instead.

2. The much more fun and unique idea is using RDMA, “sort of” a networked-SHM type of setup (internally). Hope to get a go-ahead (or contribution, of course) on this.

I mention these in the intro page of the Manual, I think.