top | item 37536103

The database servers powering Let's Encrypt (2021)

205 points| alexzeitler | 2 years ago |letsencrypt.org

119 comments

order
[+] distract8901|2 years ago|reply
Interesting that their previous server is just one or two models up from the server I recently grabbed foe $150 on eBay.

I'm just now getting serious about a homelab and it's shocking how much compute you can get for peanuts from enterprise surplus. I've had this machine (1U poweredge r620) for about a year and I'm already itching for an upgrade. Hopefully something with a lot of 3.5" drive bays for long-term data hoarding.

[+] dorfsmay|2 years ago|reply
It's because if you don't use that compute power, it's just an expensive loud heating system that's going to run up your electricity bill!
[+] candiddevmike|2 years ago|reply
Doesn't mention if they're running Linux/what flavor they're running. I personally would be wary of running OpenZFS in production on Linux, especially ZFS on root. It has bit me in the ass too many times on Debian with an update breaking DKMS and rendering my system unbootable.

Also, it's very, very strange/worrying to see no mention of disk encryption anywhere in the post or the tuning guide. For a company with encrypt in the name, that is responsible for the majority of trust on the internet, WTF? That should be highlighted in their benchmarking. ZFS supports native encryption, MariaDB does encryption, how are they encrypting at rest/transit/use?

[+] Caligatio|2 years ago|reply
Given that they're using a HSM (actually several), there's really not much that needs protection via FDE. The certs are obviously public and the domains are in the transparency log.

On the ZFS note: it's been rock solid for me with Ubuntu but a living nightmare with Arch. My Arch update would upgrade the kernel but then OpenZFS would semi-routinely incompatible resulting in an unbootable system.

[+] shrubble|2 years ago|reply
ZFS has been very reliable for me since 2006 or so when I first started using it on SPARC hardware with the Solaris 10 beta; I assume that since they have a backup server and a primary server, they don't update and reboot them both at the same time.
[+] vorpalhex|2 years ago|reply
You can always boot off ext4 and then just run data off OpenZFS pools. The benefits of booting off ZFS are extremely minimal compared to having your working data on ZFS.
[+] DistractionRect|2 years ago|reply
Usually you aren't updating production servers unless it's a security patch, fixes a problem you have, or adds a feature you want/need. Even then, usually you have a test environment to verify the upgrade won't bork the system
[+] pmarreck|2 years ago|reply
> I personally would be wary of running OpenZFS in production on Linux

A ton of people in the enterprise have been doing this for years without issue; https://openzfs.org/wiki/Companies

> especially ZFS on root

I've been running ZFS on root on NixOS since this excellent guide (https://openzfs.github.io/openzfs-docs/Getting%20Started/Nix...) for about 2 years. Zero issues. (Actually, I see they've updated it, I need to look at that. Also, they default to encrypted root. I turn it off, because the slight hit to performance and extra risk of recovery impossibility is not worth it to me.)

> It has bit me in the ass too many times on Debian with an update breaking DKMS and rendering my system unbootable

Well, I think you've found your problem, then. (That also might be why FreeNAS, which I also have running as my NAS, switched to Linux from Debian when they re-branded as TrueNAS Core/Enterprise.) Come over to NixOS, where you can simply reboot into any of N previous instances after an update that borks something (which almost never happens, anyway, because you can actually specify, in your machine configuration, "use the latest kernel that is compatible with ZFS"). No USB boot key needed. Here's the magic line from my own declarative configuration:

    kernelPackages = config.boot.zfs.package.latestCompatibleLinuxPackages;
Aaaaand... DONE. ;)

> Also, it's very, very strange/worrying to see no mention of disk encryption anywhere in the post or the tuning guide. For a company with encrypt in the name, that is responsible for the majority of trust on the internet, WTF?

You're assuming they're not doing it, without evidence. Also, if they're already managing the security around their certs and cert generation properly, they might not need FDE. FDE is overrated IMHO, frankly, and also incurs a performance cost, as well as an extra risk cost (try recovering an encrypted drive to know what I mean). In short, religions are bad, even in technological choices; there is no single technological configuration choice that is 100% better than all possible alternative configurations.

> That should be highlighted in their benchmarking. ZFS supports native encryption, MariaDB does encryption, how are they encrypting at rest/transit/use?

Multiple layers of encryption incur an extra performance cost with almost no gain in extra security.

[+] d_silin|2 years ago|reply
Shows how much bang for buck you can get when hosting your own hardware.
[+] nunez|2 years ago|reply
These performance improvements are absolutely insane, and churning out certs at speed is the _perfect_ use case for these CPUs.

Kudos the LE team!

[+] hu3|2 years ago|reply
Indeed. If I had to bet, I would say doubling the memory allowed some/all indexes to fit in RAM. RAM is usually a big improvement when it comes to Relational Database Management Systems (RMDBS).
[+] sgarland|2 years ago|reply
> We increase target & max IOPS above the defaults. We still use conservative values to avoid excessive SSD wear, but the defaults were tuned for spinning disks: innodb_io_capacity=1000, innodb_io_capacity_max=2500.

I'd be interested to see if they actually needed this. Those parameters affect the baseline and burst IOPS InnoDB is allowed to use for background tasks like flushing logs, respectively, and generally you don't need to raise them. They default to 200 and 2000; those defaults been perfectly adequate for me on a MySQL 8.x instance serving 120K+ QPS.

[+] tda|2 years ago|reply
I recently bought my first server, just about the cheapest Dell had available (still €2000). I was completely undewhelmed by the specs, it even has spinning rust. The upgrade costs to SSD's were like 250 per disk. So reading these specs, how much does such a server cost? Or do you put the disks in yourself? Can you negotiate a 50% discount? So many questions I have as a newbie in the entireprise server world.
[+] milesdyson_phd|2 years ago|reply
If you are an enterprise (or just big enough) the prices on Dell’s site are meaningless, they are just conversation starters if that.
[+] Dachande663|2 years ago|reply
From old knowledge, you’re out by about 2 orders of magnitude.
[+] Neil44|2 years ago|reply
Where possible I go third party on disks and ram.
[+] barkingcat|2 years ago|reply
No one buys dell servers at list price.
[+] PaywallBuster|2 years ago|reply
Is there an update to this article?

Since the article was written: - AMD has gen4 available - Genoa - there's NVMe drives 15tb capacity each

[+] mcpherrinm|2 years ago|reply
The same hardware is still in use today, and I expect will continue to be used for some time.

The biggest thing that’s happened in the meantime is reducing load on the databases by pushing some of the OCSP load out to redis caches: https://letsencrypt.org/2022/12/15/ocspcaching.html By not storing OCSP in the MariaDBs, we reduced write volume and storage size significantly.

The next big thing is going to be sharding the database so multiple servers can share the write loads, and some queries moved out to redis caches.

(I work for Let’s Encrypt)

[+] tyingq|2 years ago|reply
I would guess they try to get 3-5 years out of a new server, and this was done in 2021...so still a bit early for a refresh. The servers prior to this one had E5-2650 processors, so they would have been bought sometime between 2013-2016. Meaning their last refresh cycle was at least 5 years.
[+] wmf|2 years ago|reply
Those 2021 servers will probably last them 4-5 years so they haven't upgraded yet.
[+] rz2k|2 years ago|reply
And, it looks like with PCIe 5.0 x4 performance exceeds 2,500,000 IOPS reads, and 14 GBps sequential read rate.
[+] nologic01|2 years ago|reply
I bet if one could develop an index of useful computation per kwh (or $) this project would be near the top.

Imagine if that kind of hyperleveraged impact was the norm rather than the exception.

[+] dmurray|2 years ago|reply
It's kind of hard for it to be. If the only software projects that existed were the elite hyperefficient ones, the ten guys who ran them could just meet up every year for a key signing party - no need for a scalable service to automatically issue them security certificates.

Computers being commoditized means the median project is very low-value, but it also means there are millions of cool projects out there.

[+] FpUser|2 years ago|reply
What? No k8, no microservices, no cloud ? What is the world coming to?
[+] gloyoyo|2 years ago|reply
Always nice to get an upgrade.
[+] KronisLV|2 years ago|reply
> We currently use MariaDB, with the InnoDB database engine.

Oh hey, it's not often that you hear about MySQL/MariaDB on HN, so this is a nice change. For what it's worth, it's a pretty decent database for getting things done, even if not as advanced as PostgreSQL in some respects.

[+] turtles3|2 years ago|reply
I think it is also important to acknowledge that there are things innodb does better than Postgres, eg. For most workloads an undo log is a far better data structure than implementing MVCC by duplicating rows. Autovacuum and vacuum can be an absolute nightmare, plus the extra disk traffic the duplication generates. Maybe one day OrioleDb will bring this to postgres too.
[+] linsomniac|2 years ago|reply
I have had a MariaDB + Galera multi-master 3-node cluster running for several years, and prior to that Percona+Galera for ~5 years, and it's been just great. It stores lookup tables for our postfix mail server, fairly small databases and low hits, but it's great to just be able to reboot nodes for updates or migration to other VMs without having to do any of the old clustering gyrations.

I almost switched to CockroachDB in the last refresh, until I found that Postfix required Latin-1 encoding or something, and CockroachDB only supported UTF8. Postfix has more recently gotten a config option for changing that.

[+] hyc_symas|2 years ago|reply
Still an order of magnitude slower / less efficient than OpenLDAP / LMDB.
[+] jimaek|2 years ago|reply
Refreshing to see big projects running bare metal rather than overpaying for AWS. Especially one relying on donations and sponsors.
[+] mark242|2 years ago|reply
Someone reading the headline of this might think "oh, they're using Cockroach, or Fauna, or Planetscale" -- nope, this is about next-gen hardware powering their single-write (with a number of read replicas) MariaDB instance.
[+] esafak|2 years ago|reply
Indeed what I expected. It should have been titled "The Next Gen Servers Powering Let's Encrypt's Database"
[+] riku_iki|2 years ago|reply
they probably started building before Cocroach and others became solid choice, and now would need some big migration project to switch.
[+] sho|2 years ago|reply
I bet that compared to an equivalent load hosted on AWS, that lovely box pays for itself in full every month, if not every week...
[+] pid-1|2 years ago|reply
I did some napkin math (could be very, very wrong) but this server costs ~ 225,000 USD according to Dell's webpage.

AWS does not have a 100% similar VM, but you could have something close for ~ 20,000 USD monthly. Not that bad.

However, storage costs alone would be astronomic. Like > 100,000 USD / month.

I have no idea how much outbound traffic Let's Encrypt serves, but that also could be a quite relevant expense.

OFC I also don't know how much Let's Encrypt pays for energy, cooling, operations, real estate, etc... but:

> I bet that compared to an equivalent load hosted on AWS, that lovely box pays for itself in full every month

I would not take the other side on that bet

[+] d_silin|2 years ago|reply
Absolutely.

Never understood why people are so infatuated with "cloud" options. Yes, it is convinient, but you are absolutely paying at least order of magnitude more for the same amount of compute/storage.

[+] louwrentius|2 years ago|reply
I think there are ton of great use cases for the cloud, but people should try and think for themselves and decide if their circumstances and workload is really a good fit.

A ton of people forget that a bunch of servers across a few colocations can pay for itself in months, especially if you go for second-hand gear that is dirt cheap.

Again, going (back) to colocating hardware could not be good fit. But with modern management tools and datacenter services like 'remote hands' I think people should not reject it upfront.