top | item 44860882

(no title)

waschl | 6 months ago

Thought about zero-copy IPC recently. In order to avoid memcopy for the complete chain, I guess it would be best if the sender allocates its payload directly on the shared memory when it’s created. Is this a standard thing in such optimized IPC and which libraries offer this?

discuss

comex|6 months ago

IPC libraries often specifically avoid zero-copy for security reasons. If a malicious message sender can modify the message while the receiver is in the middle of parsing it, you have to be very careful not to enable time-of-check-time-of-use attacks. (To be fair, not all use cases need to be robust against a malicious sender.)

o11c|6 months ago

On Linux, that's exactly what `memfd` seals are for.

That said, even without seals, it's often possible to guarantee that you only read the memory once; in this case, even if the memory is technically mutating after you start, it doesn't matter since you never see any inconsistent state.

duped|6 months ago

What's the threat model where a malicious message sender has write access to shared memory

dataflow|6 months ago

> I guess it would be best if the sender allocates its payload directly on the shared memory when it’s created.

On an SMP system yes. On a NUMA system it depends on your access patterns etc.

6keZbCECT2uB|6 months ago

I've been meaning to look at Iceoryx as a way to wrap this.

Pytorch multiprocessing queues work this way, but it is hard for the sender to ensure the data is already in shared memory, so it often has a copy. It is also common for buffers to not be reused, so that can end up a bottleneck, but it can, in principle, be limited by the rate of sending fds.

elBoberido|6 months ago

Btw, with the next release iceoryx2 will have Python bindings. They are already on main and we will make it available via PIP. This should make it easier to use with Pytorch.

a_t48|6 months ago

I've looked into this a bit - the big blocker isn't on the transport/IPC library, but the serializer itself, assuming you _also_ want to support serializing messages to disk or over network. It's a bit of a pickle - at least in C++, tying an allocator to a structure and its children is an ugly mess. And what happens if you do something like resize a string? Does it mean a whole new allocation? I've (partially) solved it before for single process IPC by having a concept of a sharable structure and its serialization type, you could do the same for shared memory. One could also use a serializer that offers promises around allocations, FlatBuffer might fit the bill. There's also https://github.com/Verdant-Robotics/cbuf but I'm not sure how well maintained it is right now, publicly.

As for allocation - it looks like Zenoh might offer the allocation pattern necessary. https://zenoh-cpp.readthedocs.io/en/1.0.0.5/shm.html TBH most of the big wins come from not copying big blocks of memory around from sensor data and the like. A thin header and reference to a block of shared memory containing an image or point cloud coming in over UDS is likely more than performant enough for most use cases. Again, big wins from not having to serialize/deserialize the sensor data.

Another pattern which I haven't really seen anywhere is handling multiple transports - at one point I had the concept of setting up one transport as an allocator (to put into shared memory or the like) - serialize once to shared memory, hand that serialized buffer to your network transport(s) or your disk writer. It's not quite zero copy but in practice most zero copy is actually at least one copy on each end.

(Sorry, this post is a little scatterbrained, hopefully some of my points come across)

throwaway81523|6 months ago

This is one of mmap's designed-for use cases. Look at DPDK maybe.

yokaze|6 months ago

Boost.Interprocess:

https://www.boost.org/doc/libs/1_46_0/doc/html/interprocess/...