top | item 40586757

We improved the performance of a userspace TCP stack in Go

226 points| infomaniac | 1 year ago |coder.com

129 comments

order

dpeckett|1 year ago

Really cool to see others hacking on netstack, bit of a shame it's tied up in the gVisor monorepo (and all the Bazel idiosyncracies) but it's a very neat piece of kit.

I've actually been hacking on a similar FOSS project lately, with a focus on building what I'm calling a layer 3 service mesh for the edge. More or less came out of my learned hatred for managing mTLS at scale and my dislike for shoving everything through a L7 proxy (insane protocol complexity, weird bugs, and you still have the issue of authenticating you are actually talking to the proxy you expect).

Last week I got the first release of the userspace router shipped, worth taking a look if you want to play around with a completely userspace and unprivileged WireGuard compatible VPN server.

https://github.com/noisysockets/nsh/blob/main/docs/router.md

iangudger|1 year ago

If you want to use netstack without Bazel, just use the go branch:

https://github.com/google/gvisor/tree/go

go get gvisor.dev/gvisor/pkg/tcpip@go

The go branch is auto generated with all of the generated code checked in.

zxt_tzx|1 year ago

I met one of the founders of Coder.com, he's a really cool dude. It's a pity that it is a product aimed more at enterprises than individual developers, else it would have far more developer mindshare.

Unlike, say, GitHub Codespaces, running something like this on your own infra means your incentives and Coder.com's are aligned, i.e. both of you want to reduce your cloud costs (as opposed to, say, GitHub running on Azure gives them an opportunity and incentive to mark up on Azure cloud costs).

santiagobasulto|1 year ago

It seems like a great product. I'm wondering why they don't offer more "startup-oriented" plans. It's like either Self Hosted or "Talk to sales". Is it maybe to not compete against Github codespaces?

wmf|1 year ago

"Asking for elevated permissions inside secure clusters at regulated financial enterprises or top secret government networks is at best a big delay and at worst a nonstarter."

But exfiltrating data with a userspace VPN is totally fine?

I'm also wondering why not use TLS.

tptacek|1 year ago

Every connection you make to a remote service "exfiltrates data". Modern TLS is just as opaque to middleboxes as WireGuard is, unless you add security telemetry directly to endpoints --- and then you don't care about the network anyways, so just monitor the endpoint.

The reason you'd use WireGuard rather than TLS is that it allows you to talk directly to multiple services, using multiple protocols (most notably, things like Postgres and Redis) without having to build custom serverside "gateways" for each of those protocols.

anyfoo|1 year ago

You can't control what information flows through an outbound connection, not even in trivial cases. Even if you straight go ahead and say "I allow you to make this connection, but I'm not even allowing you to send any data", you have timing sidechannels to deal with. In any more reasonable case, an almost infinite number of things can be used to exfiltrate any data you want, even if you think you have not only full application-level inspection, but even application-level rewrite.

Pretty much the only thing you can do is somewhat filter out known-bad, not directly motivated outbound traffic, such as malware payloads with very clear signatures. This only works if it's "not directly motivated", because as soon as there's a person who wants to do it, they can skirt around it again.

raggi|1 year ago

fwiw, you technically don't need a privileged container to use tun, you just need suitable permissions on the kernel tun interfaces.

tazjin|1 year ago

Yeah, the optimisations are cool of course, but (maybe due to being unfamiliar with the tool?!) I didn't understand why they can't just `listen(2)`.

parhamn|1 year ago

I don't know anything about Coder, but Gvisor proliferation is annoying. It's a boon for cloud providers, helping them find another way to get a large multiple performance decrease per dollar spent in exchange for questionable security benefits. And I'm seeing it everywhere now.

weitendorf|1 year ago

I don't understand - what do you suggest as an alternative to Gvisor?

> large multiple performance decrease per dollar spent

Gvisor helps you offer multi-tenant products which can be actually much cheaper to operate and offer to customers, especially when their usage is lower than a single VM would require. Also, a lot of applications won't see big performance hits from running under Gvisor depending on their resource requirements and perf bottlenecks.

tptacek|1 year ago

Are you referring to gVisor the container runtime, or gVisor/netstack, the TCP/IP stack? I see more uptick in netstack. I don't see proliferation of gVisor itself. "Security" is much more salient to gVisor than it is to netstack.

kccqzy|1 year ago

There are still products from cloud providers that don't use gvisor. Basics like EC2 or GCE. Sounds like you chose the wrong cloud product.

loosescrews|1 year ago

Can you elaborate on your concern? Is the issue that you don't trust gVisor to keep the cloud provider secure?

raggi|1 year ago

It's great to see this, I know the team went on a long journey through this and the blog makes it almost look shorter and simpler than it was. I'm hoping one day we can all integrate the support for GSO that's been landing in gvisor too, but so far we've (tailscale) not had a chance to look deeply into that yet. It was really effective for our tun and UDP interfaces though.

kylecarbs|1 year ago

At Coder we’re fans and users of Tailscale, so very happy to have these changes be consumed upstream as well!

ignoramous|1 year ago

> one day we can all integrate the support for GSO that's been landing in gvisor

Google engs recently rewrote the GSO bit, but unlike Tailscale, it is only for TCP, though.

Besides, gvisor has had "software" & "hardware" GSO support for as long as I can remember.

pantalaimon|1 year ago

The obvious question is: How does it compare to the in-Kernel TCP stack?

raggi|1 year ago

It's less mature, which shows up in lots of places, such as sometimes having less than ideal defaults (as in buffer sizes shown here), and bugs if you start using more fancy features (which improve over time of course).

This is approximately the case for any alternative IP stack you might pick though, a mature IP stack is a huge undertaking with all the many flavors of enhancements to IP and particularly TCP over the years, the high variance in platform behaviors and configurations and so on.

In general you should only take on a dependency of a lesser-used IP stack if you're willing to retain or train IP experts in house over the long haul, because as is demonstrated here, taking on such a dependency means eventually you'll find a business need for that expertise. If that's way outside of your budget or wheelhouse, it might be worth skipping.

syzcowboy99|1 year ago

gVisor's netstack is still much slower than the kernel's (and likely always will be). The goal of this userspace netstack is not to compete with the kernel on performance, but offer an alternative that is more portable and secure.

jiveturkey|1 year ago

help me understand something.

> we’d need a way for the TCP packets to get from the operating system back into Coder for encryption.

yes, this is commonly done via OpenSSL for example.

> This is called a TUN device in unix-style operating systems and creating one requires elevated permissions

waitasec, wut? sure you could use a TUN device I guess, but assuming some kind of multi-tenant separation is an underlying assumption they didn't mention in their intro, couldn't you also use cgroup'd containers? sorry if I'm not fluent in the terminology.

i'm struggling to understand the constraints that push them towards gVisor. simply needing to do encryption doesn't seem like justification. i'm sure they have very good reasons, but needing to satisfy a financial regulator seems orthogonal at best. i would just like to understand those reasons.

nynx|1 year ago

Doesn’t creating a raw socket need elevated permissions?

tptacek|1 year ago

They're not creating raw sockets†. The neat thing about WireGuard is that it runs over vanilla UDP, and presents to the "client" a full TCP/IP interface. We normally plug that interface directly into the kernel, but you don't have to; you can just write a userspace program that speaks WireGuard directly, and through it give a TCP/IP stack interface directly to your program.

I don't think? I didn't see them say that, and we do the same thing and we don't create raw sockets.

convolvatron|1 year ago

is this part of the open source releases? I looked at the coder.com github, but couldn't find it. I haven't written a compatible TCP, but a different reliable transport in go userspace. fairness aside, i wonder why we dont see this more often. would love to take a look

andrewstuart|1 year ago

If you’re tunneling a better connection configuration isn’t the tunnel what defines the latency?

andrewstuart|1 year ago

I have a problem right now which is that it’s slow to copy large files from one side of the earth to the other. Is this the basis of a solution to that maybe?

392|1 year ago

No. Profile first. Make sure you've tried tweaking params like batch sizes.

dpe82|1 year ago

What do you think are the current problems contributing to your slow transfers?

raggi|1 year ago

not enough detail here to provide a good answer, but I can tell you explicitly that if you're using SMB you're likely not going to get good performance here even if your network stack is has tons of space to overcome bdp and congestion challenges.

yencabulator|1 year ago

tl;dr Increased TCP receive buffer size, implemented HyStart instead of traditional TCP slow start in gVisor's netstack, changed an in-process packet queue from drop-when-full to block-when-full.