It’s really frustrating that the HotOS paper itself has no details about the benchmarking, and the blog post just says “redis benchmark”. What was the system setup? Persistence options? What was ported to demikernel? The client writing, the server reading from the NIC? Based on the problem specified in the paper, I assume its reading from the NIC that was implemented in DemiOS
i think my personal feeling is that those sorts of comments you listed come out of the woodwork more when the comments section starts turning into an "oh man, this should be the standard for everyone" kind of discussion, which is never the case and is usually the point of those kinds of replies.
at least they are when i reply with those kinds of comments anyway
it is my understanding that io_uring is the generalized open source implementation of this, although i do not think it bypasses the kernel fib trie like openonload does...
I looked at using DPDK on some of our GCP instances but it requires setting up a second VPC, which was one hurdle too much.
I’m hoping that io_uring makes all of this unnecessary anyway.
I recall reading a paper where someone noticed that for every packet the Linux kernel receives it has to check if any application has opened a raw socket. Raw sockets are initially needed to allow DHCP to work, so once your machine has been assigned an IP address you can (probably) turn this service off and so give the kernel less work to do. (My memory of the exact details may be sketchy).
DHCP issues address leases, not permanent assignments; leases have an expiration time (and earlier suggested renewal/rebind times). So the DHCP client must periodically renew -- if the tenant doesn't renew (perhaps because the DHCP client has been disabled), the DHCP service may lease the address to another tenant.
If the DHCP server hasn't moved to a new address this renewal can be done over unicast using the leased address - however, if the client doesn't receive a response from the server the client state machine will eventually discard the leased address and fall back to broadcast with an all-zeros source address (which is presumably what requires a raw socket).
The DHCP client implementation in question likely keeps the raw socket open for potential future use in this case. A client might be able to close the raw socket and reopen it later (but security folks might also want it to drop the privilege required to reopen the raw socket, and it might be hard to have an ironclad guarantee that the raw socket can be reopened later on a machine that's short on free kernel memory..).
io_uring reduces the overhead of system calls - but it doesn't do anything to reduce the overhead of the actual networking stack.
If your send/receive calls spend most CPU time in going through routing/fragmentation/filter/BPF/etc path in the networking stack, then uring (or other APIs which just reduce the system call overhead, like SendMmsg/Recvmmsg for UDP) might only make a small difference. Source: Lots of profiling while implementing QUIC libraries.
An alternative to DPDK that allows to bypass the kernel networking stack would be AF_XDP.
See https://irenezhang.net/papers/demikernel-sosp21.pdf for a more thorough paper on the Demikernel from 2021. There are some great ideas for improving the kernel interface while still allowing efficient DPDK-style pipelines.
For a such an interface to be feasible to support in common open source infrastructure it needs a pure software implementation for testing and development purposes. Even better something along the lines of coz to even model performance by throttling down everything else proportionally.
blibble|1 year ago
I can get a packet out to a switch and back to another machine and in 1-2us
gtirloni|1 year ago
joeblubaugh|1 year ago
FridgeSeal|1 year ago
Therefore, I eagerly await the inevitable influx of:
- “you don’t need it”
- “you’re not FAANG enough to justify it”,
-“seems overly complicated my Python-on-Ubuntu-is-good-enough and who needs more”
Style comments telling us why we shouldn’t have fun things like this.
Anyone got anymore comments to add to the bingo-card?
wmf|1 year ago
dijksterhuis|1 year ago
i think my personal feeling is that those sorts of comments you listed come out of the woodwork more when the comments section starts turning into an "oh man, this should be the standard for everyone" kind of discussion, which is never the case and is usually the point of those kinds of replies.
at least they are when i reply with those kinds of comments anyway
kd913|1 year ago
https://github.com/Xilinx-CNS/onload
a-dub|1 year ago
secondcoming|1 year ago
I’m hoping that io_uring makes all of this unnecessary anyway.
I recall reading a paper where someone noticed that for every packet the Linux kernel receives it has to check if any application has opened a raw socket. Raw sockets are initially needed to allow DHCP to work, so once your machine has been assigned an IP address you can (probably) turn this service off and so give the kernel less work to do. (My memory of the exact details may be sketchy).
Polizeiposaune|1 year ago
If the DHCP server hasn't moved to a new address this renewal can be done over unicast using the leased address - however, if the client doesn't receive a response from the server the client state machine will eventually discard the leased address and fall back to broadcast with an all-zeros source address (which is presumably what requires a raw socket).
The DHCP client implementation in question likely keeps the raw socket open for potential future use in this case. A client might be able to close the raw socket and reopen it later (but security folks might also want it to drop the privilege required to reopen the raw socket, and it might be hard to have an ironclad guarantee that the raw socket can be reopened later on a machine that's short on free kernel memory..).
Matthias247|1 year ago
If your send/receive calls spend most CPU time in going through routing/fragmentation/filter/BPF/etc path in the networking stack, then uring (or other APIs which just reduce the system call overhead, like SendMmsg/Recvmmsg for UDP) might only make a small difference. Source: Lots of profiling while implementing QUIC libraries.
An alternative to DPDK that allows to bypass the kernel networking stack would be AF_XDP.
r00tbeer|1 year ago
crest|1 year ago
Gollapalli|1 year ago
unknown|1 year ago
[deleted]