top | item 40029070

(no title)

ygoldfeld | 1 year ago

I think that (whether a native-struct versus a capnp schema-based struct = helps/how much) is a general question of what kind of serialization is best for a particular use-case. I wouldn't want to litigate that here fully. Personally though I've found capnp-based IPC protocols to be neat and helpful, across versions and protocol changes (where e.g. there are well-defined rules of forward-compatibility; and Flow-IPC gives you niceties including request-response and message-type demultiplexing to a particular handler). [footnote 1 below]

BUT!!! Some algorithms don't require an "IPC protocol" per se, necessarily, but more like = 2+ applications collaborating on a data structure. In that case native structures are for sure superior, or at times even essentially required. (E.g., if you have some custom optimized hash-table -- you're not going to want to express it as a capnp structure probably.)

So, more to the point:

- Flow-IPC 100% supports transmitting/sharing (and constructing, and auto-destroyting) native C++ structures. Compared to iceoryx, on this point, it appears to have some extra capabilities, namely full support for structures with pointers/references and/or STL-compliant containers. (This example https://iceoryx.io/latest/examples/complexdata/ and other pages say things like, "To implement zero-copy data transfer we use a shared memory approach. This requires that every data structure needs to be entirely contained in the shared memory and must not internally use pointers or references. The complete list of restrictions can be found...".) Flow-IPC, in this context, means no need to write custom containers sans heap-use, or eliminate pointers in an existing structure. [footnote 2 below]

- Indeed, the capnp framing (only if you choose to use the Flow-IPC capnp-protocol feature in question!) adds processing and thus some computational and RAM-use overhead. For many applications, the 10s of microseconds added there don't matter much -- as long as they are constant regardless of structure size, and as long as they are 10s of microseconds. So a 100usec (modulo processor model of course!) RTT (size-independent) is pretty good still. Of course I would never claim this overhead doesn't matter to anyone, and iceoryx's results here are straight-up admirable.

[footnote 1] The request/response/demultiplexing/etc. niceties added by Flow-IPC's capnp-protocol feature-in-question work well IMO, but one might prefer the sweet RPC-semantics + promise pipelining of capnp-RPC. Kenton V (capnp inventor/owner) and I have spoken recently about using Flow-IPC to zero-copy-ify capnp-RPC. I'm looking into it! (He suspects it is pretty simple/natural, given that we handle the capnp-serialization layer already, and capnp-RPC is built on that.) This wouldn't change Flow-IPC's existing features but rather exercise another way of using them. In a way Flow-IPC provides a simple-but-effective-out-of-the-box schema-based conversation protocol via capnp-serialization, and capnp-RPC would provide an alternate (to that out-of-the-box guy) conversation protocol option. I tried pretty hard to design Flow-IPC in a grounded and layered way, so such work would be natural as opposed to daunting.

[footnote 2] In fact the Flow-IPC capnp-based structured-channel feature (internally) itself uses Flow-IPC's own native-structure-transmission feature in its implementation (eat our own dog-food). Since a capnp serialization = sequence of buffers (a.k.a. segments), for us it is (internally) represented as essentially an STL list<vector<uint8_t>>. So we construct/build one of those in SHM (internally); then only a small SHM-handle is (internally) transmitted over the IPC-transport [footnote 3]; and the receiver then obtains the in-place list<vector<uint8_t>> (essentially) which is then treated as the capnp-encoding it really is. This would all happen (internally) when executing the quite-short example in the blog (https://www.linode.com/blog/open-source/flow-ipc-introductio...). As you can see there, to the Flow-IPC-using developer, it's just -- like -- "create a message with this schema here, call some mutators, send"; and conversely "receive a message expected to have that (same) schema, OK -- got it; call some accessors."

[footnote 3] IPC-transport = Unix domain socket or one 2 MQ types -- you can choose via template arg (or add your own IPC-transport by implementing a certain pair of simple concepts).

discuss

order

jeffreygoesto|1 year ago

Thank you very much for this excellent explanation! I am one of the fathers of IceOryx and it's predecessor. We had to lift component based embedded development to Posix systems and are very latency and memory bandwidth sensitive (driver assistance and automated driving on what most people would call small SoCs). There it is easier to enforce the senders and receivers to use the same struct.

What you did with the shm arena and sharing std containers is outright amazing and indeed relaxes the "self contained" constraint nicely.

On QNX (up to 7) we were bitten by each syscall going through procnto, that is why we have chosen lockfree over mq btw.

Being aware of the use case and choosing the right tradeoff is crucial, as you wrote.

elBoberido|1 year ago

Now I'm curious. It's seems you are not the father I'm still drinking beer with. This means there is only one person left that fits this attribute :) ... we should meet for some beer with the other father ;)

elBoberido|1 year ago

I'm one of the iceoryx mantainers. Great to see some new players in this field. Competition leads to innovation and maybe we can even collaborate in some areas :)

I did not yet look at the code but you made me curious with the raw pointers. Do you found a way to make this work without serialization or mapping the shm to the same address in all processes?

I will have a closer look at the jemmaloc integration since we had something similar in mind with iceoryx2.

ygoldfeld|1 year ago

We are doing it with fancy-pointers (yes, that is the actual technical term in C++ land) and allocators. It’s open-source, so it’s not like there’s any hidden magic, of course: “Just” a matter of working through it.

Using manual mapping (same address values on both sides, as you mentioned) was one idea that a couple people preferred, but I was the one who was against it, and ultimately this was heeded. So that meant:

Raw pointer T* becomes Allocator<T>::pointer. So if user happens to enjoy using raw pointers directly in their structures, they do need to make that change. But, beats rewriting the whole thing… by a lot.

container<T> becomes container<T, Allocator<T>>, where `container` was your standard or standard-compliant (uses allocator properly) container of choice. So if user prefers sanity and thus uses containers (including custom ones they developed or third-party STL-compliant ones), they do need to use an allocator template argument in the declaration of the container-typed member.

But, that’s it - no other changes in data structure (which can be nested and combined and …) to make it SHM-sharable.

We in library “just” have to provide the SHM-friendly Allocator<T> for user to use. And, since stateful allocators are essentially unusable by mere humans in my subjective opinion (boost.interprocess authors disagree apparently), use a particular trick to work with an individual SHM arena. “Activator” API.

So that leaves the mere topic of this SHM-friendly fancy-pointer type, which we provide.

For SHM-classic mode (if you’re cool with one SHM arena = one SHM segment and both sides being able to write to SHM; and boost.interprocess alloc algorithm) —- enabled with a template arg switch when setting up your session object —- that’s just good ol’ offset_ptr.

For SHM-jemalloc (which leverages jemalloc, and hence is multi-segment and cool like that, plus with better segregation/safety between the sides) internally there are multiple SHM-segments, so offset_ptr is insufficient. Hence we wrote a fancy-pointer for the allocator, which encodes the SHM segment ID and offset within the 64 bits. That sounds haxory and hardcore, but it’s not so bad really. BUT! It needs to also be able to be able to point outside SHM (e.g., into stack which is often used when locally building up a structure), so it needs to be able to encode an actually-raw vaddr also. And still use 64 bits, not more. Soooo I used pointer tagging, as not all 64 bits of a vaddr carry information.

So that’s how it all works internally. But hopefully to the user none of these details is necessary to understand. Use our allocator when declaring container members. Use allocator’s fancy-pointer type alias (or similar alias, we give ya the aliases conveniently hopefully) when declaring a direct pointer member. And specify which SHM-backing technique you want us to internally use - depending on your safety and allocation perf desires (currently available choices are SHM-classic and SHM-jemalloc).