top | item 41681344

Show HN: Iceoryx2 – Fast IPC Library for Rust, C++, and C

122 points| elfenpiff | 1 year ago |ekxide.io

Hello everyone,

Today we released iceoryx2 v0.4!

iceoryx2 is a service-based inter-process communication (IPC) library designed to make communication between processes as fast as possible - like Unix domain sockets or message queues, but orders of magnitude faster and easier to use. It also comes with advanced features such as circular buffers, history, event notifications, publish-subscribe messaging, and a decentralized architecture with no need for a broker.

For example, if you're working in robotics and need to process frames from a camera across multiple processes, iceoryx2 makes it simple to set that up. Need to retain only the latest three camera images? No problem - circular buffers prevent your memory from overflowing, even if a process is lagging. The history feature ensures you get the last three images immediately after connecting to the camera service, as long as they’re still available.

Another great use case is for GUI applications, such as window managers or editors. If you want to support plugins in multiple languages, iceoryx2 allows you to connect processes - perhaps to remotely control your editor or window manager. Best of all, thanks to zero-copy communication, you can transfer gigabytes of data with incredibly low latency.

Speaking of latency, on some systems, we've achieved latency below 100ns when sending data between processes - and we haven't even begun serious performance optimizations yet. So, there’s still room for improvement! If you’re in high-frequency trading or any other use case where ultra-low latency matters, iceoryx2 might be just what you need.

If you’re curious to learn more about the new features and what’s coming next, check out the full iceoryx2 v0.4 release announcement.

Elfenpiff

Links:

* GitHub: https://github.com/eclipse-iceoryx/iceoryx2 * iceoryx2 v0.4 release announcement: https://ekxide.io/blog/iceoryx2-0-4-release/ * crates.io: https://crates.io/crates/iceoryx2 * docs.rs: https://docs.rs/iceoryx2/0.4.0/iceoryx2/

51 comments

hardwaresofton|1 year ago

Been doing some IPC experiments recently following the 3tilley post[0], because there just isn't enough definitive information (even if it's a snapshot in time) out there.

Shared memory is crazy fast, and I'm surprised that there aren't more things that take advantage of it. Super odd that gRPC doesn't do shared memory, and basically never plans to?[1].

All that said, the constructive criticism I can offer for this post is that in mass-consumption announcements like this one for your project, you should:

- RPC throughput (with the usual caveats/disclaimers) - Comparison (ideally graphed) to an alternative approach (ex. domain sockets) - Your best/most concise & expressive usage snippet

100ns is great to know, but I would really like to know how much RPC/s this translates to without doing the math, or seeing it with realistic de-serialization on the other end.

[0]: https://3tilley.github.io/posts/simple-ipc-ping-pong/

[1]: https://github.com/grpc/grpc/issues/19959

a_t48|1 year ago

In my experience shared memory is really hard to implement well and manage:

1. Unless you're using either fixed sized or specially allocated structures, you end up paying for serialization anyhow (zero copy is actually one copy).

2. There's no way to reference count the shared memory - if a reader crashes, it holds on to the memory it was reading. You can get around this with some form of watchdog process, or by other schemes with a side channel, but it's not "easy".

3. Similar to 2, if a writer crashes, it will leave behind junk in whatever filesystem you are using to hold the shared memory.

4. There's other separate questions around how to manage the shared memory segments you are using (one big ring buffer? a segment per message?), and how to communicate between processes that different segments are in use and that new messages are available for subscribers. Doable, but also not simple.

It's a tough pill to swallow - you're taking on a lot of complexity in exchange for that low latency. If you can do so, it's better to put things in the same process space if you can - you can use smart pointers and a queue and go just as fast, with less complexity. Anything CUDA will want to be single process, anyhow, (ignoring cuda IPC, anyhow). The number of places where you need (a) ultra low latency (b) high bandwidth/message size (c) can't put everything in the same process (d) are using data structures suited to shared memory and finally (e) are okay with taking on a bunch of complexity just isn't that high. (It's totally possible I'm missing a Linux feature that makes things easy, though).

I plan on integrating iceoryx into a message passing framework I'm working on now (users will ask for SHM), but honestly either "shared pointers and a queue" or "TCP/UDS" are usually better fits.

abhirag|1 year ago

At $work we are evaluating different IPC strategies in Rust. My colleague expanded upon 3tilley's work, they have updated benchmarks with iceoryx2 included here[0]. I suppose the current release should perform even better.

[0]: https://pranitha.rs/posts/rust-ipc-ping-pong/

sischoel|1 year ago

I was looking a bit at the code for the shared memory implementation in https://github.com/3tilley/rust-experiments/tree/master/ipc and the dependency <https://github.com/elast0ny/raw_sync-rs.

My last systems programming class was already a few years ago and I am a bit rusty, so I got some questions:

1. Looking at the code in https://github.com/elast0ny/raw_sync-rs/blob/master/src/even...) it looks like we are using a userspace spinlock. Aren't these really bad because the mess with the process scheduler and might unnecessarily trigger the scaling governor to increase the cpu frequency? I think at least on linux one could use a semaphore to inform the consumer that new data has been produced.

2. What kind of memory guarantees do we have on modern computer architectures such as x86-64 and ARM? If the producers does two writes (I imagine first the data and then the release of the lock) - is it guaranteed that when the consumer reads the second value that also the first value has been synchronized?

elBoberido|1 year ago

Thanks for the tips. We have a comparison with message queues and unix domain sockets [1] on the repo on github [2].

~~It's nice to see that independent benchmarks are in the same ballpark than the one we perform.~~ Edit: sorry, I confused your link with another one which also has ping-pong in its title

We provide data types which are shared memory compatible, which means one does not have to serialize/deserialize. For image or lidar data, one also does not have to serialize and this is where copying large data really hurts. But you are right, if your data structures are not shared memory compatible, one has to serialize the data first and this has its cost, depending on what serialization format one uses. iceoryx is agnostic to this though and one can select what's the best for a given use case.

[1]: https://raw.githubusercontent.com/eclipse-iceoryx/iceoryx2/r... [2]: https://github.com/eclipse-iceoryx/iceoryx2

pjmlp|1 year ago

Yeah, I think it is about time we re-focus on multi-processing as extension mechanism, given the available hardware we have nowadays.

Loading in-process plugins was a great idea 20 - 30 years ago, however it has been proven that is isn't such a great idea in regards to host stability, or exposed to possible security exploits.

And shared memory is a good compromise between both models.

emmanueloga_|1 year ago

Looks great! From a quick glance it seems like it is a cross platform shared memory library. Maybe similar to this? [1].

Suggestion: would be cool to have a quick description of the system calls involved for each supported platform [2]. I'm guessing mmap on linux/osx and CreateFileMapping on Windows?

1: https://github.com/LiveAsynchronousVisualizedArchitecture/si...

2: https://github.com/eclipse-iceoryx/iceoryx2?tab=readme-ov-fi...

elfenpiff|1 year ago

You guessed right. We have a layered architecture that abstracts this away for every platform. With this, we can support every OS as long as it has a way of sharing memory between processes (or tasks as some RTOSes are calling it) and you have a way of sending notifications.

fefe23|1 year ago

This smells like they are using shared memory, which is almost certainly a security nightmare. The way they are selling it makes me fear they aren't aware of what a time bomb they are sitting on.

Shared memory works as a transport if you either assume that all parties are trusted (in which case why do IPC in the first place? Just put them in a monolith), or you do hardcore capturing (make a copy of each message in the framework before handing it off). Their web page mentions zero copy, so it's probably not the second one.

Also, benchmarks are misleading.

It's easy to get good latency if your throughput is so high that you can do polling or spin locks, like for example in benchmarks. But that's probably not a good assumption for general usage because it will be very inefficient and waste power and require more cooling as well.

zbentley|1 year ago

> Shared memory works as a transport if you either assume that all parties are trusted (in which case why do IPC in the first place? Just put them in a monolith)

There are all sorts of domains where mutually-trusted parties need IPC. Off the top of my head and in no particular order:

- Applications that pass validated data to/from captive subprocesses. Not everything is available as a natively-linked library. Not every language's natively-linked libraries are as convenient to reliably install as external binaries.

- Parallelism/server systems farming work out to forked (but not exec'd) subprocesses. Not everything needs setuid. Somtimes you just want to parallelize number crunching without the headache of threads (or are on a platform like Python which limits threads' usefulness).

- Replatforming/language transitions in data-intensive applications. Running the new runtime/platform in the same address space as the legacy platform can bring some hairy complexity, which is sidestepped (especially given the temporary-ness of the transitional state) with careful use of shared memory.

And aren't systems like Postgres counterpoints to your claim? My memory isn't the greatest, but IIRC postgres's server-side connections are subprocesses which interact with the postmaster via shared memory, no?

CyberDildonics|1 year ago

Shared memory would be two processes that can already do whatever they want communicating with each other. What is it that you think is a 'security nightmare' ?

what a time bomb they are sitting on

You didn't give any real evidence of this or examples.

Shared memory works as a transport if you either assume that all parties are trusted (in which case why do IPC in the first place?

Because you can have two or more different processes communicate asynchronously. They are in their own memory space and running on different threads. One doesn't crash the other. All they need to work together is data structures and data formats.

Don't forget that files are the original IPC.

Also, benchmarks are misleading.

Saying something is wrong is easy when you don't have anything to show that it's wrong.

that's probably not a good assumption for general usage

Then don't do it. Shared memory can use atomics, it can be totally lock free. You each process do checks that are just atomically reading and integer.

gnulinux|1 year ago

> Shared memory works as a transport if you either assume that all parties are trusted (in which case why do IPC in the first place? Just put them in a monolith), or you do hardcore capturing (make a copy of each message in the framework before handing it off). Their web page mentions zero copy, so it's probably not the second one.

This is an extremely puzzling comment. I can think of thousands of such cases.

First, there are many reasons to split your program into processes instead of threads (e.g. look at browsers) so even if you have a monolith, you may need IPC between trusted parties simply because of software engineering practices. As a more extreme example, if you're writing code in a language like Python, where multi-threading is a huge liability due to GIL and the standard solution is to just use multi-processing, you'll need a channel between your processes (even if they're just fork()'d) and so you need to use something like filesystem, unix pipe, postgresql, redis, some ipc lib (e.g. TFA)... whatever as a way to communicate.

Second, your comment implies there is no scenario where implementing two separate programs is preferable to building a monolith. Even though you believe in general monoliths are better, it doesn't follow that they have to always be the right approach for every software. You may have a program that requires extremely different computational techniques, e.g. one part written in Prolog because it needs logical constraint satisfaction solving, or one part needs X language because you have to use a specialized library only available in language X, or you may need one part of your program to be in C/C++/Go/Rust for improved latency, or you may need part of your program in "slow" Y language because every other codebase in your company is written in Y. This language barrier is simply one reason. I can come up with many others. For example, parts of the software may be developed by two separate teams and the IPC is decided as the interface between them.

Long story short, it's pretty normal to have a monolithic codebase, but N processes running at the same. In such cases since all N processes are written by you, running in hopefully-trusted hardware, using an IPC framework like this is a good idea. This is not necessarily the most common problem in software engineering, but if you do enough systems programming you'll see that a need for IPC between trusted processes is hardly niche. I personally reach for tools like iceoryx quite frequently.

elfenpiff|1 year ago

> This smells like they are using shared memory, which is almost certainly a security nightmare.

Yes, we are using shared memory, and I agree that shared memory is a challenge but there are some mechanisms that can make it secure.

The main problem with shared memory is, that one process can corrupt the data structure while another process is consuming it. Even verifying the contents of the data structure is insufficient since it can always be corrupted afterwards. We have named the problem "modify-after-delivery problem" - a sender modifies the data after it has been delivered to a receiver.

This can be handled with:

1. memfd: The sender acquires it, writes its payload, seals it so that it is read-only and then transfers the file descriptor to all receivers. The receiver can verify the read-only seal with fcntl. Since linux guarantees us that it cannot be reverted the receiver can now safely consume the data. This allows it to be used even in a zero-trust environment. [1] provides a good introduction (see the File-Sealing IPC subsection). 2. Memory protection keys [2]: I do not have too much experience with them, but as far as I understand, they solve the problem with mprotect, meaning, the sender can call mprotect and make the segment read only for it self, but the receiver has no way of verifying it or to prevent the sender from calling mprotect again and granting it read/write access again to corrupt the data.

So, the approach is that a sender acquires shared memory, writes its payload into it, makes it read-only, and then transfers it to the receivers.

> Shared memory works as a transport if you either assume that all parties are trusted (in which case why do IPC in the first place?

Robustness is another use case. In mission-critical systems you trust each process but a crash caused by a bug in one sub-system shall not bring down the whole system. So you split up the monolith in many processes and the overall system survives if one process goes down or deadlocks, assuming you have a shared memory library that itself is safe. If you detect a process crash, you can restart it and continue operations.

[1] https://dvdhrm.wordpress.com/2014/06/10/memfd_create2/ [2] https://www.kernel.org/doc/html/latest/core-api/protection-k...

westurner|1 year ago

How does this compare to and/or integrate with OTOH Apache Arrow which had "arrow plasma IPC" and is supported by pandas with dtype_backend="pyarrow", lancedb/lancedb, and Serde.rs? https://serde.rs/#data-formats

zxexz|1 year ago

The other commenter answering you is, I think, trying to point out that the Arrow plasma store is deprecated (and no longer present in the arrow project).

I think it's worth being a little more clear here - Arrow IPC is _not_ deprecated, and has massive momentum - so much so that it's more or less already become the default IPC format for many libraries.

To me it remains unclear what the benefits of Iceoryx2 over the Arrow ecosystem is, and what the level of interoperability is, and what the tradeoffs of either are relative to eachother. Within a single machine, you can mmap the IPC file. You can use Arrow Flight for inter-node or inter-process communication. You can use Arrow with Ray, which is where Plasma went.

I love anything new in this space though, if/when I have time I'll check this out - would love it if somebody could actually eloborate on the differences though.

dumah|1 year ago

https://github.com/apache/arrow/issues/34738

https://lists.apache.org/thread/lk277x3b9gjol42sjg27bst2ggm5...

westurner|1 year ago

Another IPC bus: "Varlink – IPC to replace D-Bus gradually in systemd" https://news.ycombinator.com/item?id=41687413

Serde does serialization and serialization with many formats in Rust

boolit2|1 year ago

I'm in robotics education and we mostly work with Python, to make life easier for students. I'd love to push for more Rust, but so far there's no point to it.

Multiprocess communication is something that we found lacking in Python (we want everything to be easily pip installable) and we ended up using shared memory primitives, which is a lot of code to maintain.

What is the main roadblock for iceoryx2 Python bindings? Is it something you are looking for contributors for?

orecham|1 year ago

The only real roadblock for this is time. We'd welcome any contributor, especially with such an impactful contribution as this. Making iceoryx2 an option for Python would be amazing. If you or someone you know wants to take it on, we would support as much as possible.

So far for this topic, we have only done some brief research on the options available to us, such us going over the C API or using something like PyO3.

npalli|1 year ago

Congrats on the release.

What's the difference between iceoryx and iceoryx2? I don't want to use Rust and want to stick to C++ if possible.

elBoberido|1 year ago

Besides being written in Rust, the big difference is the decentralized approach. With iceoryx1 a central daemon is required but with iceoryx2 this in not the case anymore. Furthermore, more fine grained control over the resources like memory and endpoints like publisher. Overall the architecture is more modular and it should be easier to port iceoryx2 to even more platforms and customize it with 3rd party extension.

With this release we have initial support for C and C++. Not all features of the Rust version are supported yet, but the plan is to finish the bindings with the next release. Furthermore, with an upcoming release we will make it trivial to communicate between Rust, C and C++ applications and all the other language bindings we are going to provide, with Python being probably the next one.

tbillington|1 year ago

> Language bindings for C and C++ with CMake and Bazel support right out of the box. Python and other languages are coming soon.

tormeh|1 year ago

Looks like it has significantly lower latency.

> want to stick to C++ if possible

The answer to that concern is in the title of the submission.

MuffinFlavored|1 year ago

Pretty cool. I see publish + subscribe example on GitHub but no request/response. Am I missing something?

elBoberido|1 year ago

Request-response is on our todo list and will be introduced in an upcoming release :)

What are you needing request-response for?

forrestthewoods|1 year ago

Why is Windows target support tier 2 and not tier 1?

elfenpiff|1 year ago

Tier 1 also means all security/safety features. Windows is not used in mission-critical systems like cars or plans, so we do not need to add those to Windows.

We aim to support Windows so iceoryx2 can be used safely and securely in a desktop environment.

unknown|1 year ago

[deleted]