(no title)
fefe23 | 1 year ago
Shared memory works as a transport if you either assume that all parties are trusted (in which case why do IPC in the first place? Just put them in a monolith), or you do hardcore capturing (make a copy of each message in the framework before handing it off). Their web page mentions zero copy, so it's probably not the second one.
Also, benchmarks are misleading.
It's easy to get good latency if your throughput is so high that you can do polling or spin locks, like for example in benchmarks. But that's probably not a good assumption for general usage because it will be very inefficient and waste power and require more cooling as well.
zbentley|1 year ago
There are all sorts of domains where mutually-trusted parties need IPC. Off the top of my head and in no particular order:
- Applications that pass validated data to/from captive subprocesses. Not everything is available as a natively-linked library. Not every language's natively-linked libraries are as convenient to reliably install as external binaries.
- Parallelism/server systems farming work out to forked (but not exec'd) subprocesses. Not everything needs setuid. Somtimes you just want to parallelize number crunching without the headache of threads (or are on a platform like Python which limits threads' usefulness).
- Replatforming/language transitions in data-intensive applications. Running the new runtime/platform in the same address space as the legacy platform can bring some hairy complexity, which is sidestepped (especially given the temporary-ness of the transitional state) with careful use of shared memory.
And aren't systems like Postgres counterpoints to your claim? My memory isn't the greatest, but IIRC postgres's server-side connections are subprocesses which interact with the postmaster via shared memory, no?
fefe23|1 year ago
I agree with your parallelism counter-argument in principle. However even there it would probably make sense to not trust each other, to limit the blast radius of successful attacks.
In your next point the "careful" illustrates exactly my point. Using shared memory for IPC is like using C or C++ and saying "well I'll be careful then". It can work but it will be very dangerous and most likely there will be security issues. You are much better off not doing it.
Postgres is a beautiful argument in that respect. Yes you can write a database in C or C++ and have it use shared memory. It's just not recommended because you need professionals of the caliber of the Postgres people to pull it off. I understand many organizations think they have those. I don't think they actually do though.
CyberDildonics|1 year ago
what a time bomb they are sitting on
You didn't give any real evidence of this or examples.
Shared memory works as a transport if you either assume that all parties are trusted (in which case why do IPC in the first place?
Because you can have two or more different processes communicate asynchronously. They are in their own memory space and running on different threads. One doesn't crash the other. All they need to work together is data structures and data formats.
Don't forget that files are the original IPC.
Also, benchmarks are misleading.
Saying something is wrong is easy when you don't have anything to show that it's wrong.
that's probably not a good assumption for general usage
Then don't do it. Shared memory can use atomics, it can be totally lock free. You each process do checks that are just atomically reading and integer.
gnulinux|1 year ago
This is an extremely puzzling comment. I can think of thousands of such cases.
First, there are many reasons to split your program into processes instead of threads (e.g. look at browsers) so even if you have a monolith, you may need IPC between trusted parties simply because of software engineering practices. As a more extreme example, if you're writing code in a language like Python, where multi-threading is a huge liability due to GIL and the standard solution is to just use multi-processing, you'll need a channel between your processes (even if they're just fork()'d) and so you need to use something like filesystem, unix pipe, postgresql, redis, some ipc lib (e.g. TFA)... whatever as a way to communicate.
Second, your comment implies there is no scenario where implementing two separate programs is preferable to building a monolith. Even though you believe in general monoliths are better, it doesn't follow that they have to always be the right approach for every software. You may have a program that requires extremely different computational techniques, e.g. one part written in Prolog because it needs logical constraint satisfaction solving, or one part needs X language because you have to use a specialized library only available in language X, or you may need one part of your program to be in C/C++/Go/Rust for improved latency, or you may need part of your program in "slow" Y language because every other codebase in your company is written in Y. This language barrier is simply one reason. I can come up with many others. For example, parts of the software may be developed by two separate teams and the IPC is decided as the interface between them.
Long story short, it's pretty normal to have a monolithic codebase, but N processes running at the same. In such cases since all N processes are written by you, running in hopefully-trusted hardware, using an IPC framework like this is a good idea. This is not necessarily the most common problem in software engineering, but if you do enough systems programming you'll see that a need for IPC between trusted processes is hardly niche. I personally reach for tools like iceoryx quite frequently.
elfenpiff|1 year ago
Yes, we are using shared memory, and I agree that shared memory is a challenge but there are some mechanisms that can make it secure.
The main problem with shared memory is, that one process can corrupt the data structure while another process is consuming it. Even verifying the contents of the data structure is insufficient since it can always be corrupted afterwards. We have named the problem "modify-after-delivery problem" - a sender modifies the data after it has been delivered to a receiver.
This can be handled with:
1. memfd: The sender acquires it, writes its payload, seals it so that it is read-only and then transfers the file descriptor to all receivers. The receiver can verify the read-only seal with fcntl. Since linux guarantees us that it cannot be reverted the receiver can now safely consume the data. This allows it to be used even in a zero-trust environment. [1] provides a good introduction (see the File-Sealing IPC subsection). 2. Memory protection keys [2]: I do not have too much experience with them, but as far as I understand, they solve the problem with mprotect, meaning, the sender can call mprotect and make the segment read only for it self, but the receiver has no way of verifying it or to prevent the sender from calling mprotect again and granting it read/write access again to corrupt the data.
So, the approach is that a sender acquires shared memory, writes its payload into it, makes it read-only, and then transfers it to the receivers.
> Shared memory works as a transport if you either assume that all parties are trusted (in which case why do IPC in the first place?
Robustness is another use case. In mission-critical systems you trust each process but a crash caused by a bug in one sub-system shall not bring down the whole system. So you split up the monolith in many processes and the overall system survives if one process goes down or deadlocks, assuming you have a shared memory library that itself is safe. If you detect a process crash, you can restart it and continue operations.
[1] https://dvdhrm.wordpress.com/2014/06/10/memfd_create2/ [2] https://www.kernel.org/doc/html/latest/core-api/protection-k...