How is it with high bandwidth application? E.g would it be okay to put my media server behind it? Currently tunneling it through a VPS so cloudflare doesn't get mad.
Tailscalar here: there is a bandwidth limit, it's a funnel, not a hose. We don't announce what the bandwidth limit is, but please keep in mind that it does exist. I would suggest setting up your media server inside your tailnet for the best experiences, but it depends on who you are sharing it with and why.
I might be missing something; isn’t a Tailnet a bunch of user devices with wireguard tunnels connecting to each other directly? Where does the limit happen?
(And thanks for your work!)
Edit after 1 minute: of course, limit on Tailscale Funnel itself. (Too deep into thinking about Tailscale and forgot about the actual topic of the post. )
Hola, how would the bandwidth limit work within the tailnet if I am accessing it from outside my home network? Wouldn't it incur some bandwidth on Tailscale's end?
I wonder if the DERPy-stuff helps remove most of the bandwidth concerns - thinking out loud...
Since tailscaled uses the tun/tap driver and thus copies all traffic to userspace (and back), it is extremely inefficient. On my Haswell i5 (plus multiple servers with comparable hardware) the process consumes 40% of CPU time at just 4 MiB/s, and close to 100% at 10-11 MiB/s (with recent sendmmsg/recvmmsg patches¹).
This is about ~2-3x worse than similar applications written in highly optimized C, so don't expect any miracles from further optimizations unless they switch to kernel Wireguard (which doesn't seem likely in the nearby future).
They claim it's very difficult if not impossible, but this sounds like an issue with their architecture — a similar application from their competitors² has had kernel WireGuard support from the start (no relation, I don't even use it and cannot recommend for or against it).
Tailscalar here, for what it's worth, I run my plex server on Tailscale (i5 10600) and I haven't noticed any observable lag due to the TUN/TAP driver. Even with 4k bluray rips at several tens of megabits per second of video quality. I also regularly get near the limit of gigabit ethernet when transferring big files like machine learning models (the 1280 byte MTU plus WireGuard overhead adds up over time and can make the application observed rate be less than what the NIC is actually doing).
Kernel WireGuard for Tailscale is hard because of DERP (HTTPS/TCP fallback relay, all connections start over DERP so that they can Just Work if hole punching fails), but I'm sure it could happen with the right combination of eBPF and Rust in the kernel. It'd be a bit easier if there was a high level abstraction for using the kernel TLS stack to do outgoing TLS connections.
Hi! Tailscaler here, one of the folks who worked on the recent throughput improvements. One of the machines I was testing with during our work on segment offloading was a Haswell. I absolutely understand your concern if we're using 40% of CPU at 4MiB/s, we should be doing substantially better than that on efficiency. In our various testbeds which include CPUs like yours, we see higher performance. If you'd like us to look into the issue, do email support@tailscale.com - we'd be really happy to dig in and find the cause.
We have continued our work on performance improvements, and along that path, as an example, we recently diagnosed an issue with a change in the kernel frequency scaling governor that has a regression that Tailscale can tickle and we have an ongoing discussion with the kernel maintainers about that problem. I'm not at all assuming this particular thing is the key source of the performance you're observing, it is more to provide an anecdote that we're still digging deep into areas where we aren't performing well and finding the root cause, and working both inside and outside to address those and where appropriate to add workarounds as well.
I observe there's about 37% overhead when using TS connection on a local gigabit network.
Copying large file from Synology DS1821+ NAS (Amd Ryzen V1500B) to Windows PC (i7-6700K)
is about 111-113 MB/s when accessing NAS directly and 70-73 MB/s when traffic goes through TS
(different large files, so no caching here).
xena|2 years ago
jonpurdy|2 years ago
(And thanks for your work!)
Edit after 1 minute: of course, limit on Tailscale Funnel itself. (Too deep into thinking about Tailscale and forgot about the actual topic of the post. )
pciexpgpu|2 years ago
I wonder if the DERPy-stuff helps remove most of the bandwidth concerns - thinking out loud...
5e92cb50239222b|2 years ago
This is about ~2-3x worse than similar applications written in highly optimized C, so don't expect any miracles from further optimizations unless they switch to kernel Wireguard (which doesn't seem likely in the nearby future).
They claim it's very difficult if not impossible, but this sounds like an issue with their architecture — a similar application from their competitors² has had kernel WireGuard support from the start (no relation, I don't even use it and cannot recommend for or against it).
1: https://tailscale.com/blog/throughput-improvements
2: https://github.com/netbirdio/netbird
xena|2 years ago
Kernel WireGuard for Tailscale is hard because of DERP (HTTPS/TCP fallback relay, all connections start over DERP so that they can Just Work if hole punching fails), but I'm sure it could happen with the right combination of eBPF and Rust in the kernel. It'd be a bit easier if there was a high level abstraction for using the kernel TLS stack to do outgoing TLS connections.
raggi|2 years ago
We have continued our work on performance improvements, and along that path, as an example, we recently diagnosed an issue with a change in the kernel frequency scaling governor that has a regression that Tailscale can tickle and we have an ongoing discussion with the kernel maintainers about that problem. I'm not at all assuming this particular thing is the key source of the performance you're observing, it is more to provide an anecdote that we're still digging deep into areas where we aren't performing well and finding the root cause, and working both inside and outside to address those and where appropriate to add workarounds as well.
yurymik|2 years ago
Copying large file from Synology DS1821+ NAS (Amd Ryzen V1500B) to Windows PC (i7-6700K) is about 111-113 MB/s when accessing NAS directly and 70-73 MB/s when traffic goes through TS (different large files, so no caching here).