top | item 35561951

(no title)

ffk | 2 years ago

Interestingly, the fastest CPU based network switches tend to do full kernel bypass. The kernel is generally slow compared to OVS and VPP, especially when they traverse over something like DPDK.

discuss

arghwhat|2 years ago

Kernel bypass in DPDK grants the application direct access to DMA buffers so that the kernel is no longer involved. This is not because the kernel is slow, but because many small syscalls are expensive and putting your entire app in the kernel is a bad idea.

There is no kernel bypass in wireguard-go, just a user-space implementation fast implementation with smart use of syscalls to minimize the overhead of being split between user-space and kernel-space.

With io_uring, DPDK-style kernel bypass might stop making sense altogether.

ffk|2 years ago

It depends on what you are trying to do though. I don’t think the kernel has an easy path to operating on a set of packet headers as a vector at this point. Not saying it can’t happen, but it’s an area where user space is already ahead.

For reference, there was a previous test that demonstrated 40gbps with ipsec between two pods on separate nodes in k8s where the encap/decap achieved 40gbps which was the line rate for the Intel NICs used.

Details were published here: https://medium.com/fd-io-vpp/getting-to-40g-encrypted-contai...

I do agree that io_uring will negate the need for DPDK for many use cases though, it will likely be a much simpler path and more secure path than DPDK.

ilyt|2 years ago

It's not "kernel is slow", kernel when left to its own devices is plenty fast, the reason is that when you want to make decision about packet in userspace (vs telling kernel what to do with it via various interfaces) that kernel logic would just be overhead.

It's similar for applications; if you can, say, decode whole DNS packet in one go, you don't really want kernel to spend time decoding UDP packet, then you decoding the rest of the packet; doing it in one step is much faster.

ffk|2 years ago

There are some applications where the ability to vectorize the headers and operate on them with SIMD help. These types of apps tend to pin a full core to do only packet processing though. Also, syscall are expensive. A lot of work is going into making the APIs async while avoiding syscalls.

renewiltord|2 years ago

Are there consumer (<$2k) network switches that can do Wireguard in a very fast path?

wmf|2 years ago

By their nature as L2/L3 devices, I wouldn't expect switches to ever support Wireguard. I also haven't heard of any hardware Wireguard yet. The fastest implementation so far might be TNSR which just squeaks in under $2,000.