It would be interesting to see stats from the CAs about which of the Blessed Methods is most popular. (This article is about Let's Encrypt using tls-alpn-01 which is an implementation of 3.2.2.4.10 "TLS Using a Random Number"). Doubtless Fly aren't the only people doing tls-alpn-01 in bulk but we don't have a good overview as far as I'm aware.
In principle they can all generate those statistics because they (are supposed to) log enough information to identify what went wrong when, inevitably, something is misissued. Logically that also includes at least which method was used to verify domain authorization or control.
One of the things wrong at Symantec is that it turns out some of the records were notionally kept at CrossCert, a separate Korean company. CrossCert simply did not keep any records (or if it did they were in such disarray that it seemed less likely to attract retribution by refusing to disclose them) and Symantec had seemingly never checked.
Knowing which methods are popular with Subscribers, and whether that varies considerably between CAs would be valuable in trying to figure out how more of the worst Blessed Methods can be deprecated or improved, and who we need to be talking to about that.
For example maybe Let's Encrypt is doing almost all the 3.2.2.4.19 ("Agreed Upon Change to Website - ACME") then there's no point ragging on other CAs for the shortcomings of relying on plaintext HTTP in this method. Or maybe DigiCert are doing a lot of 3.2.2.4.15 ("Phone Contact with Domain Contact") so they are the people to talk through any proposed improvements around stuff like leaving a Voice mail.
Part of the last few weeks involved me learning Rust and using it in anger (if hooking nfqueue up to tokio counts as "in anger") so if you'd like to irritate the hell out of 'pcwalton, feel free to ask me Rust questions.
> Obviously, to do stuff like this, you need to generate certificates. The reasonable way to do that in 2020 is with LetsEncrypt. We do that for our users automatically, but “it just works” makes for a pretty boring writeup, so let’s see how complicated and meandering I can make this.
Is anyone else feeling quite sad reading this article? ALPN being used because only 80/443 are realistic these days, middleboxes causing the TLS handshake to have padding so it's not misinterpreted with an ancient protocol (SSLv2).
Most of this could have been avoided by using DOH and SRV records for HTTP/HTTPS. I still don't understand why SRV records is not supported for HTTP/HTTPS in browsers.
ALPN would make sense for something like HTTP2 even if you didn't have the problem of ports being blocked. If HTTP2 had its own port clients would have to make multiple TCP connection attempts for each host they connect to.
I have the opposite feeling, the clever "hacks" people use to build very useful stuff that bypasses most problems with legacy infrastructure are pretty exciting. It's very much like watching a complex organism evolve into something you never really could have imagined 8000 iterations ago.
> We proxy traffic from edge servers to containers through a global WireGuard mesh.
I am more interested in the mesh. Do you have more details on that? Specifically why this architecture was chosen, what kind of latency does WireGuard add, etc.
Ooooh I love Wireguard, we'll have an article about this in the next couple of months.
We picked it because it's really simple to manage, and we wanted to ensure traffic between datacenters was always encrypted. We have a little tool called "flywire" that keeps wireguard peer configs updated from Consul. Once we accept a connection from a user, we pick a target VM, and then connect them over the wireguard mesh.
For our purposes, it basically doesn't add any noticeable latency. I think when we tested we say something on the order of 0.1ms of added latency over wireguard, but I don't quite remember. It's never been the source of latency problems when we do have them, at least!
Question about fly.io: do you support HTTP/2? I have wanted to put gRPC services directly on the edge but most managed services make it completely convoluted to set up (the lone exception being Google Cloud Run).
... and use Caddy to do the heavy lifting. (I'm biased, yes. But the linked doc is multi-authored and applies to every sysadmin or developer who needs to manage certs, regardless of your software choice.)
I also love Caddy. In fact, you can run it on Fly.io (and even opt out of our TLS/cert stack). I would love it if it could just put certs in Vault, though.
It's probably better to compare Fly with Lambda or Fargate. It's not really meant to be cheaper than AWS, though, the real value is being able to run app servers all over the world without spending time maintaining servers or wrangling AWS.
Firecracker is f'ing awesome. I have a lot of notes to write up about it. I know this isn't how products actually succeed in the real world, but I'll be honest and say that Kurt had me at Fly with "WireGuard and Firecracker".
(For the unfamiliar reader: Firecracker is a micro-vm system that sits sort of in between a fully virtualized host, like an EC2 instance, and a container like Docker; you get the security isolation of a hypervisor but the speed/simplicity of Docker. It's the engine that powers AWS Lambda and Fargate. The Usenix paper is a pretty great read, and the code [it's all in Rust] is simple and easy to follow.)
[+] [-] tialaramex|5 years ago|reply
In principle they can all generate those statistics because they (are supposed to) log enough information to identify what went wrong when, inevitably, something is misissued. Logically that also includes at least which method was used to verify domain authorization or control.
One of the things wrong at Symantec is that it turns out some of the records were notionally kept at CrossCert, a separate Korean company. CrossCert simply did not keep any records (or if it did they were in such disarray that it seemed less likely to attract retribution by refusing to disclose them) and Symantec had seemingly never checked.
Knowing which methods are popular with Subscribers, and whether that varies considerably between CAs would be valuable in trying to figure out how more of the worst Blessed Methods can be deprecated or improved, and who we need to be talking to about that.
For example maybe Let's Encrypt is doing almost all the 3.2.2.4.19 ("Agreed Upon Change to Website - ACME") then there's no point ragging on other CAs for the shortcomings of relying on plaintext HTTP in this method. Or maybe DigiCert are doing a lot of 3.2.2.4.15 ("Phone Contact with Domain Contact") so they are the people to talk through any proposed improvements around stuff like leaving a Voice mail.
[+] [-] tptacek|5 years ago|reply
[+] [-] NetOpWibby|5 years ago|reply
This delighted me.
[+] [-] dochtman|5 years ago|reply
How is the Fly proxy implemented? Are you using rustls and/or any of the available ACME crates?
I've been wanting to implement tls-alpn-01 support for rustls (although it might be possible to do this just by mutating the ServerConfig over time).
Also interested to hear your general impressions of Rust so far (I think I read some Twitter grumbling...).
[+] [-] dchest|5 years ago|reply
[+] [-] ancarda|5 years ago|reply
It feels like the Internet is so fragile.
[+] [-] SahAssar|5 years ago|reply
[+] [-] profmonocle|5 years ago|reply
[+] [-] mrkurt|5 years ago|reply
[+] [-] unknown|5 years ago|reply
[deleted]
[+] [-] Karupan|5 years ago|reply
I am more interested in the mesh. Do you have more details on that? Specifically why this architecture was chosen, what kind of latency does WireGuard add, etc.
[+] [-] mrkurt|5 years ago|reply
We picked it because it's really simple to manage, and we wanted to ensure traffic between datacenters was always encrypted. We have a little tool called "flywire" that keeps wireguard peer configs updated from Consul. Once we accept a connection from a user, we pick a target VM, and then connect them over the wireguard mesh.
For our purposes, it basically doesn't add any noticeable latency. I think when we tested we say something on the order of 0.1ms of added latency over wireguard, but I don't quite remember. It's never been the source of latency problems when we do have them, at least!
[+] [-] hashamali|5 years ago|reply
[+] [-] tptacek|5 years ago|reply
https://fly.io/docs/app-guides/run-a-private-dns-over-https-...
(Beyond that, for whatever it's worth: you can skip our HTTP/H2 termination entirely and speak TCP directly to your VMs).
[+] [-] mholt|5 years ago|reply
... and use Caddy to do the heavy lifting. (I'm biased, yes. But the linked doc is multi-authored and applies to every sysadmin or developer who needs to manage certs, regardless of your software choice.)
[+] [-] mrkurt|5 years ago|reply
I also love Caddy. In fact, you can run it on Fly.io (and even opt out of our TLS/cert stack). I would love it if it could just put certs in Vault, though.
[+] [-] lomkju|5 years ago|reply
micro-2x shared 512MB $0.000003044 $8 VS t3a.nano 2 Variable 0.5 GiB EBS Only $0.0031 per Hour
I'm missing something? cause seeing the pricing I still feel AWS is cheaper.
[+] [-] mrkurt|5 years ago|reply
[+] [-] unknown|5 years ago|reply
[deleted]
[+] [-] awinter-py|5 years ago|reply
[+] [-] tptacek|5 years ago|reply
(For the unfamiliar reader: Firecracker is a micro-vm system that sits sort of in between a fully virtualized host, like an EC2 instance, and a container like Docker; you get the security isolation of a hypervisor but the speed/simplicity of Docker. It's the engine that powers AWS Lambda and Fargate. The Usenix paper is a pretty great read, and the code [it's all in Rust] is simple and easy to follow.)
https://www.usenix.org/system/files/nsdi20-paper-agache.pdf
[+] [-] mrkurt|5 years ago|reply