Show HN: Comprehensive inter-process communication (IPC) toolkit in modern C++
88 points| ygoldfeld | 1 year ago |github.com
- will make it considerably less annoying to code than typical approaches; and
- may massively reduce the latency involved.
Those sharing Cap'n Proto-encoded data may have particular interest. Cap'n Proto (https://capnproto.org) is fantastic at its core task - in-place serialization with zero-copy - and we wanted to make the IPC (inter-process communication) involving capnp-serialized messages be zero-copy, end-to-end.
That said, we paid equal attention to other varieties of payload; it's not limited to capnp-encoded messages. For example there is painless (<-- I hope!) zero-copy transmission of arbitrary combinations of STL-compliant native C++ data structures.
To help determine whether Flow-IPC is relevant to you we wrote an intro blog post. It works through an example, summarizes the available features, and has some performance results. https://www.linode.com/blog/open-source/flow-ipc-introductio...
Of course there's nothing wrong with going straight to the GitHub link and getting into the README and docs.
Currently Flow-IPC is for Linux. (macOS/ARM64 and Windows support could follow soon, depending on demand/contributions.)
rurban|1 year ago
My only problem is MacOS with its too small default SHM buffers, you need to enhance them. Most solutions need a reboot, but a simple setter is enough. Like sudo sysctl -w kern.sysv.shmmax=16777216
ygoldfeld|1 year ago
https://github.com/Flow-IPC/ipc/issues/101 (<= https://github.com/orgs/Flow-IPC/discussions/98)
For macOS/ARM64, currently it looks to me like the apparent lack of /dev/shm equivalent (unless I messed up in searching for it) means the most significant amount of new work necessary to port it ... but you just mentioned a thing I did not know about. (SHM size/count limits definitely were a thing in Linux, though, indeed.) TY
abcd_f|1 year ago
You can literally do everything with mmap that you can do with shm, without hitting OS caps, no performance penalty and with a code that's simpler.
pmalynin|1 year ago
jeffreygoesto|1 year ago
In contrast to Cap'n'Proto you get compiler optimized struct layout as benefit from using raw structs. Benchmarks are here https://iceoryx.io/v2.0.2/examples/iceperf/
ygoldfeld|1 year ago
BUT!!! Some algorithms don't require an "IPC protocol" per se, necessarily, but more like = 2+ applications collaborating on a data structure. In that case native structures are for sure superior, or at times even essentially required. (E.g., if you have some custom optimized hash-table -- you're not going to want to express it as a capnp structure probably.)
So, more to the point:
- Flow-IPC 100% supports transmitting/sharing (and constructing, and auto-destroyting) native C++ structures. Compared to iceoryx, on this point, it appears to have some extra capabilities, namely full support for structures with pointers/references and/or STL-compliant containers. (This example https://iceoryx.io/latest/examples/complexdata/ and other pages say things like, "To implement zero-copy data transfer we use a shared memory approach. This requires that every data structure needs to be entirely contained in the shared memory and must not internally use pointers or references. The complete list of restrictions can be found...".) Flow-IPC, in this context, means no need to write custom containers sans heap-use, or eliminate pointers in an existing structure. [footnote 2 below]
- Indeed, the capnp framing (only if you choose to use the Flow-IPC capnp-protocol feature in question!) adds processing and thus some computational and RAM-use overhead. For many applications, the 10s of microseconds added there don't matter much -- as long as they are constant regardless of structure size, and as long as they are 10s of microseconds. So a 100usec (modulo processor model of course!) RTT (size-independent) is pretty good still. Of course I would never claim this overhead doesn't matter to anyone, and iceoryx's results here are straight-up admirable.
[footnote 1] The request/response/demultiplexing/etc. niceties added by Flow-IPC's capnp-protocol feature-in-question work well IMO, but one might prefer the sweet RPC-semantics + promise pipelining of capnp-RPC. Kenton V (capnp inventor/owner) and I have spoken recently about using Flow-IPC to zero-copy-ify capnp-RPC. I'm looking into it! (He suspects it is pretty simple/natural, given that we handle the capnp-serialization layer already, and capnp-RPC is built on that.) This wouldn't change Flow-IPC's existing features but rather exercise another way of using them. In a way Flow-IPC provides a simple-but-effective-out-of-the-box schema-based conversation protocol via capnp-serialization, and capnp-RPC would provide an alternate (to that out-of-the-box guy) conversation protocol option. I tried pretty hard to design Flow-IPC in a grounded and layered way, so such work would be natural as opposed to daunting.
[footnote 2] In fact the Flow-IPC capnp-based structured-channel feature (internally) itself uses Flow-IPC's own native-structure-transmission feature in its implementation (eat our own dog-food). Since a capnp serialization = sequence of buffers (a.k.a. segments), for us it is (internally) represented as essentially an STL list<vector<uint8_t>>. So we construct/build one of those in SHM (internally); then only a small SHM-handle is (internally) transmitted over the IPC-transport [footnote 3]; and the receiver then obtains the in-place list<vector<uint8_t>> (essentially) which is then treated as the capnp-encoding it really is. This would all happen (internally) when executing the quite-short example in the blog (https://www.linode.com/blog/open-source/flow-ipc-introductio...). As you can see there, to the Flow-IPC-using developer, it's just -- like -- "create a message with this schema here, call some mutators, send"; and conversely "receive a message expected to have that (same) schema, OK -- got it; call some accessors."
[footnote 3] IPC-transport = Unix domain socket or one 2 MQ types -- you can choose via template arg (or add your own IPC-transport by implementing a certain pair of simple concepts).
ygoldfeld|1 year ago
There’s some discussion on it in Show HN, and of course I can answer anything here that people might be interested in too. I’m very proud of it and very grateful Akamai gave the resources to open-source it.
I’d like to have a flashier friendlier site with a slick intro video - haven’t had the time to do that stuff - but the substance and API documentation + Manual are very serious and complete, I hope.
All linked off the blog-post!
dang|1 year ago
But readers will probably want to look at the other article as well: https://www.linode.com/blog/open-source/flow-ipc-introductio....
emmelaich|1 year ago
mgaunard|1 year ago
I can't tell what this library does; the blog articles and readme all talk about stuff that isn't close to any of the challenges that I see.
ygoldfeld|1 year ago
https://flow-ipc.github.io/doc/flow-ipc/versions/main/genera...
I should also note that Flow-IPC does not provide "serialization"; it does however enable the use of an existing/best serializer (capnp) for zero-copy messaging. This is only one feature -- albeit oft requested, hence my decision to base the blog/README example on it. (I'm currently also looking into extending this to capnp-RPC.)
But, of course, we don't expect it to match what everyone is looking for; in your case IceOryx might be more your speed -- have a look.
ygoldfeld|1 year ago
I hope the above 2 links get the job done in communicating the key points. There is certainly no shortage of documentation! Still:
If you'll indulge me, I do want to share how this project got started and became open-source. I actually do suspect this might help one get a feeling of what this thing is, and is not.
My name is Yuri Goldfeld. I have worked at Akamai since 2005 (with a break for startup shenanigans, and VMware, in the middle). I designed or co-designed Flow-IPC and wrote about 75% of it (by lines of code ignoring comments); my colleague Eddy Chan wrote the rest, including the bulk of the SHM-jemalloc component (which is really cool IMO).
Akamai in certain core parts is a C++/Linux shop, with dogged scrutiny to latency. Every millisecond along the request path is scrutinized. A few years ago I was asked to do a couple things: - Determine the best serializer to use, in general, but especially for IPC protocols. The answer there was easy IMO: Cap'n Proto. - Split-up a certain important C++ service into several parts, for various reasons, without adding latency to the request path.
The latter task meant, among other things, communicating large amounts of user data from server application to server application. capnp-encoded structures (sometimes big - but not necessarily) would also need to be transmitted; as would FDs.
The technical answers to these challenges are not necessarily rocket science. FDs can be transmitted via Unix domain socket as "ancillary data"; the POSIX `sendmsg()` API is hairy but usable. Small messages can be transmitted via Unix domain socket, or pipe, or POSIX MQ (etc.). Large blobs of data it would not be okay to transmit via those transports, as too much copying into and out of kernel buffers is involved and would add major latency, so we'd have to use shared memory (SHM). Certainly a hairy technology... but again, doable. And as for capnp - well - you "just" code a `MessageBuilder` implementation that allocates segments in SHM instead of regular heap like `capnp::MallocMessageBuilder` does.
Thing is, I noticed that various parts of the company had similar needs. I've observed some variation of each of the aforementioned tasks custom-implemented - again, and again, and again. None of these implementations could really be reused anywhere else. Most of them ran into the same problems - none of which is that big a deal on its own, but together (and across projects) it more than adds up. To coders it's annoying. And to the business, it's expensive!
Plus, at least one thing actually proved to be technically quite hard. Sharing (via SHM) a native C++ structure involving STL containers and/or raw pointers: downright tough to achieve in a general way. At least with Boost.interprocess (https://www.boost.org/doc/libs/1_84_0/doc/html/interprocess....) - which is really quite thoughtful - one can accomplish a lot... but even then, there are key limitations, in terms of safety and ease of use/reusability. (I'm being a bit vague here... trying to keep the length under control.)
So, I decided to not just design/code an "IPC thing" for that original key C++ service I was being asked to split... but rather one that could be used as a general toolkit, for any C++ applications. Originally we named it Akamai-IPC, then renamed it Flow-IPC.
As a result of that origin story, Flow-IPC is... hmmm... meat-and-potatoes, pragmatic. It is not a "framework." It does not replace or compete with gRPC. (It can, instead, speed RPC frameworks up by providing the zero-copy transmission substrate.) I hope that it is neither niche nor high-maintenance.
To wit: If you merely want to send some binary-blob messages and/or FDs, it'll do that - and make it easier by letting you set-up a single session between the 2 processes, instead of making you worry about socket names and cleanup. (But, that's optional! If you simply want to set up a Unix domain socket yourself, you can.) If you want to add structured messaging, it supports Cap'n Proto - as noted - and right out of the box it'll be zero-copy end-to-end. That is, it'll do all the SHM stuff without a single `shm_open()` or `mmap()` or `ftruncate()` on your part. And if you want to customize how that all works, those layers and concepts are formally available to you. (No need to modify Flow-IPC yourself: just implement certain concepts and plug them in, at compile-time.)
Lastly, for those who want to work with native C++ data directly in SHM, it'll simplify setup/cleanup considerably compared to what's typical. For the original Akamai service in question, we needed to use SHM as intensively as one typically uses the regular heap. So in particular Boost.interprocess's built-in 2 SHM-allocation algorithms were not sufficient. We needed something more industrial-strength. So we adapted jemalloc (https://jemalloc.net/) to work in SHM, and worked that into Flow-IPC as a standard available feature. (jemalloc powers FreeBSD and big parts of Meta.) So jemalloc's anti-fragmentation algorithms, thread caching - all that stuff - will work for our SHM allocations.
Having accepted this basic plan - develop a reusable IPC library that handled the above oft-repeated needs - Eddy Chan joined and especially heavily contributed on the jemalloc aspects. A couple years later we had it ready for internal Akamai use. All throughout we kept it general - not Akamai-specific (and certainly not specific to that original C++ service that started it all off) - and personally I felt it was a very natural candidate for open-source.
To my delight, once I announced it internally, the immediate reaction from higher-up was, "you should open-source it." Not only that, we were given the resources and goodwill to actually do it. I have learned that it's not easy to make something like this presentable publicly, even having developed it with that in mind. (BTW it is about 69k lines of code, 92k lines of comments, excluding the Manual.)
So, that's what happened. We wrote a thing useful for various teams internally at Akamai - and then Akamai decided we should share it with the world. That's how open-source thrives, we figured.
On a personal level, of course it would be gratifying if others found it useful and/or themselves contributed. What a cool feeling that would be! After working with exemplary open-source stuff like capnp, it'd be amazing to offer even a fraction of that usefulness. But, we don't gain from "market share." It really is just there to be useful. So we hope it is!
robobully|1 year ago
My question is, how does Flow-IPC compare to projects like Mojo IPC (from Chromium) and Eclipse iceoryx? At first glance they all pursue similar goals and pay much less attention to complex allocation management, yet managing to perform well enough.
OnlyMortal|1 year ago
Both for unix domain sockets and TCP.
There’re plenty of boost examples around so, I’d suggest, you take their examples and work them for your framework.
As I’m sure you’re aware, a clean and easy to read example will make a difference.
It’s great that you’re open source and I hope you get some traction.
signa11|1 year ago
we _unfortunately_ gravitated towards protobuf's despite my fervent appeal to go with capn-proto. that has caused a cascade of troubles / missed opportunities for optimizations etc. etc.
fwsgonzo|1 year ago
I don't like that protobuf has recently started linking with abseil, which despite being a good framework, I can't use it if it doesn't build absolutely everywhere I need it to. So, maybe I'll be forced over to CapnProto one of these days?
sgtnoodle|1 year ago
What troubles has protobuf caused you?
ygoldfeld|1 year ago
https://flow-ipc.github.io/doc/flow-ipc/versions/main/genera...
forrestthewoods|1 year ago
Dang. I was excited for a brief moment, but support for macOS + Windows is mandatory for all of my use cases.
To be honest what I actually want is NOT "the fastest possible thing". All I actually care about is "easy advertisement, discovery, and message sending". I use localhost TCP way more than I want because it "just works".
Maybe someday I'll stumble across my dream IPC library.
ygoldfeld|1 year ago
Concretely what it would take to port it to those OS: https://github.com/Flow-IPC/ipc/issues/101
Given a couple weeks to work on it, this thing would be on macOS no problem. With Windows I personally need to understand its FD-passing and native handle concepts first, but I’m guessing it’d be a similar amount of effort in the end.
ygoldfeld|1 year ago
nonane|1 year ago
Does Flow-IPC protect against malformed messages? For example a client sending malformed messages to a server process
sgtnoodle|1 year ago
seego|1 year ago
ygoldfeld|1 year ago
1. The obvious one is “just” extending stuff internally working via Unix domain sockets to TCP sockets. Various internal code is written with an eye to that, including anticipating that certain operations (such as connect) that are instant locally can would-block in a network.
If people enjoy the API, this would be a no-brainer value-add, even if lots of people would scoff and use actual dedicated networking techniques (HTTP, whatever) directly instead.
2. The much more fun and unique idea is using RDMA, “sort of” a networked-SHM type of setup (internally). Hope to get a go-ahead (or contribution, of course) on this.
I mention these in the intro page of the Manual, I think.