More Memory Safety for Let's Encrypt: Deploying ntpd-rs

[+] NelsonMinar|1 year ago|reply

I like the idea of NTPD in Rust. Is there anything to read about how well ntpd-rs performs? Would love a new column for chrony's comparison: https://chrony-project.org/comparison.html

Particularly interested in the performance stats, how well the daemon keeps time in the face of various network problems. Chrony is very good at this. Some of the other NTP implementations (not on that chart) are so bad they shouldn't be used in production.

[+] rnijveld|1 year ago|reply

In our internal testing we are very close to Chrony with our synchronization performance, some of our testing data and an explanation of our algorithm is published in our repository: https://github.com/pendulum-project/ntpd-rs/tree/main/docs/a...

Given the amount of testing we (and other parties) have done, and given the strong theoretical foundation of our algorithm I’m pretty confident we’d do well in many production environments. If you do find any performance issues though, we’d love to hear about them!

[+] ComputerGuru|1 year ago|reply

Unlike say, coreutils, ntp is something very far from being a solved problem and the memory safety of the solution is unfortunately going to play second fiddle to its efficacy.

For example, we only use chrony because it’s so much better than whatever came with your system (especially on virtual machines). ntpd-rs would have to come at least within spitting distance of chrony’s time keeping abilities to even be up for consideration.

(And I say this as a massive rust aficionado using it for both work and pleasure.)

[+] hi-v-rocknroll|1 year ago|reply

You might be doing too much work at the wrong level of abstraction. VMs should use host clock synchronization. It requires some work and coordination, but it eliminates the need for ntp in VMs entirely.

Hosts should then be synced using PTP or a proper NTP local stratum (just get a proper GNSS source for each DC if you have then funds).

https://tsn.readthedocs.io/timesync.html

Deploy chrony to bare metal servers wherever possible.

[+] syncsynchalt|1 year ago|reply

The biggest danger in NTP isn't memory safety (though good on this project for tackling it), it's

(a) the inherent risks in implementing a protocol based on trivially spoofable UDP that can be used to do amplification and reflection

and

(b) emergent resonant behavior from your implementation that will inadvertently DDOS critical infrastructure when all 100m installed copies of your daemon decide to send a packet to NIST in the same microsecond.

I'm happy to see more ntpd implementations but always a little worried.

[+] rnijveld|1 year ago|reply

I would encourage you to take a look at some of our testing data and an explanation of our algorithm in our repository (https://github.com/pendulum-project/ntpd-rs/tree/main/docs/a...). I think we are very much in spitting distance of Chrony in terms of synchronization performance, sometimes even beating Chrony. But we’d love for more people to try our algorithm in their infrastructure and report back. The more data the better.

[+] agwa|1 year ago|reply

What exactly does "time keeping abilities" mean? If I had to choose between 1) an NTP implementation with sub-millisecond accuracy that might allow a remote attacker to execute arbitrary code on my server and 2) an NTP implementation which may be ~100ms off but isn't going to get me pwned, I'm inclined to pick option 2. Is writing an NTP server that maintains ~100ms accuracy not a solved problem?

[+] cogman10|1 year ago|reply

This seems like a weird place to be touting memory safety.

It's ntpd, it doesn't seem like a place for any sort of attack vector and it's been running on many VMs without exploding memory for a while now.

I'd think there are far more critical components to rewrite in a memory safe language than the clock synchronizer.

[+] jaas|1 year ago|reply

I'm the person driving this.

NTP is worth moving to a memory safe language but of course it's not the single most critical thing in our entire stack to make memory safe. I don't think anyone is claiming that. It's simply the first component that got to production status, a good place to start.

NTP is a component worth moving to a memory safe language because it's a widely used critical service on a network boundary. A quick Google for NTP vulnerabilities will show you that there are plenty of memory safety vulnerabilities lurking in C NTP implementations:

https://www.cvedetails.com/vulnerability-list/vendor_id-2153...

Some of these are severe, some aren't. It's only a matter of time though until another severe one pops up.

I don't think any critical service on a network boundary should be written in C/C++, we know too much at this point to think that's a good idea. It will take a while to change that across the board though.

If I had to pick the most important thing in the context of Let's Encrypt to move to a memory safe language it would be DNS. We have been investing heavily in Hickory DNS but it's not ready for production at Let's Encrypt yet (our usage of DNS is a bit more complex than the average use case).

https://github.com/hickory-dns/hickory-dns

Work is proceeding at a rapid pace and I expect Hickory DNS to be deployed at Let's Encrypt in 2025.

[+] oconnor663|1 year ago|reply

> it's been running on many VMs without exploding memory for a while now

Most of the security bugs we hear about don't cause random crashes on otherwise healthy machines, because that tends to get them noticed and fixed. It's the ones that require complicated steps to trigger that are really scary. When I look at NTP, I see a service that:

- runs as root

- talks to the network

- doesn't usually authenticate its traffic

- uses a bespoke binary packet format

- almost all network security depends on (for checking cert expiration)

That looks to me like an excellent candidate for a memory-safe reimplementation.

[+] luma|1 year ago|reply

It's present on loads of systems, it's a very common service to offer, it's a reasonably well-constrained use case, and the fact that nobody thinks about it might be a good reason to think about it. They can't boil the ocean but one service at a time is a reasonable approach.

I'll flip the question around, why not start at ntpd?

[+] lambdaone|1 year ago|reply

NTP is a ubiquitous network service that runs directly exposed to the Internet, and that seems to me like a good thing to harden. Making NTP more secure does not stop anyone else from working on any other project.

[+] rnijveld|1 year ago|reply

I do think that memory safety is important for any network service. The probability of something going horribly wrong when a network packet is parsed in a wrong way is just too high. NTP typically does have more access to the host OS than other daemons, with it needing to adjust the system clock.

Of course, there are many other services that could be made memory safe, and maybe there is some sort of right or smart order in which we should make our core network infrastructure memory safe. But everyone has their own priorities here, and I feel like this could end up being an endless debate of whatabout-ism. There is no right place to start, other than to just start.

Aside from memory safety though, I feel like our implementation has a strong focus on security in general. We try and make choices that make our implementation more robust than what was out there previously. Aside from that, I think the NTP space has had an under supply of implementations, with there only being a few major open source implementations (like ntpd, ntpsec and chrony). Meanwhile, NTP is one of those pieces of technology at the core of many of the things we do on the modern internet. Knowing the current time is one of these things you just need in order to trust many of the things we take for granted (without knowledge of the current time, your TLS connection could never be trusted). I think NTP definitely deserves this attention and could use a bunch more attention.

[+] astrobe_|1 year ago|reply

Agreed. It is a "good old" binary protocol, so the many gotchas of text protocols are not there.

[+] mre|1 year ago|reply

I spoke with Folkert, one of the developers on this project, on the 'Rust in Production' podcast. Some of you might find it interesting: https://corrode.dev/podcast/s01e05-tweede-golf/

[+] akira2501|1 year ago|reply

Why does your ntpd have a json dependency?

[+] danudey|1 year ago|reply

This is a good question to ask, especially in the age of everything pulling in every possible dependency just to get one library function or an `isNumeric()` convenience function.

The answer is that there is observability functionality which provides its results as JSON output via a UNIX socket[0]. As far as I can see, there's no other JSON functionality anywhere else in the code, so this is just to allow for easily querying (and parsing) the daemon's internal state.

(I'm not convinced that JSON is the way to go here, but that's the answer to the question)

[0] https://docs.ntpd-rs.pendulum-project.org/development/code-s...

[+] rnijveld|1 year ago|reply

I don’t think our dependency tree is perfect, but I think our dependencies are reasonable overall. We use JSON for transferring metrics data from our NTP daemon to our prometheus metrics daemon. We’ve made this split for security reasons, why have all the attack surface of a HTTP server in your NTP daemon? That didn’t make sense to us. Which is why we added a readonly unix socket to our NTP daemon that on connecting dumps a JSON blob and then closes the connection (i.e. doing as little as possible), which is then usable by our client tool and by our prometheus metrics daemon. That data transfer uses json, but could have used any data format. We’d be happy to accept pull requests to replace this data format with something else, but given budget and time constraints, I think what we came up with is pretty reasonable.

[+] orf|1 year ago|reply

Would you rather it had a JSON dependency to parse a config file, or yet another poorly thought out, ad-hoc homegrown config file format?

[+] hcfman|1 year ago|reply

If you want to setup a chrony time server that maintains accuracy to within a microsecond and doesn’t do this with a network connection then you could try my sbts-aru project and just not use the audio recorder parts of it.

https://github.com/hcfman/sbts-aru

It installs with a single command on all Raspberry Pi versions and takes care of all the dependencies, configuration and startup order details to install and start working with one command.

It’s a sound localizing audio recorder platform and that’s why it also sets up accurate time.

It’s using GPS to get its time from.

[+] _joel|1 year ago|reply

Reading this reminded me of ntpsec, anyone actually use that?

[+] move-on-by|1 year ago|reply

Yes, Debian transitioned to NTPSec with bookworm. The NTP package is just a dummy transitional package to that installs NTPsec.

https://packages.debian.org/bookworm/net/ntp

[+] xvilka|1 year ago|reply

BGP probably should be the next.

[+] nubinetwork|1 year ago|reply

The problem with ntp isn't the client, it's the servers having to deal with forged UDP packets. Will ntpd ever become TCP-only? Sadly I'm not holding my breath. I stopped running a public stratum 3 server ~10 years ago.

[+] Faaak|1 year ago|reply

On the contrary, I'm hosting a stratum 1 and 2 stratum 2s (at my previous company we offered 3 stratum 1s) on the ntp pool. It's useful, used, and still needed :-)

[+] brohee|1 year ago|reply

When one can make a stratum 1 server for $100, there is very little reason for the continuous existence of public NTP servers. ISP can offer the service to their customers, and any company with a semblance of IT dept can have its own stratum 1.

[+] akaletF|1 year ago|reply

[deleted]

[+] skilled|1 year ago|reply

[deleted]

[+] mianosm|1 year ago|reply

The Jonestown massacre was actually grape flavor-aid:

https://www.vox.com/2015/5/23/8647095/kool-aid-jonestown-fla...

They really do appear to be all in on avoiding memory leaks from C/CPP:

> Over the next few years we plan to continue replacing C or C++ software with memory safe alternatives in the Let’s Encrypt infrastructure: OpenSSL and its derivatives with Rustls, our DNS software with Hickory, Nginx with River, and sudo with sudo-rs. Memory safety is just part of the overall security equation, but it’s an important part and we’re glad to be able to make these improvements.

It seems like a really challenging endeavor, but I appreciate their desire to maintain uptime and a public service like they do.

[+] tialaramex|1 year ago|reply

Correctness matters, in their particular game that's especially true although I'm doubtful of common insistence that it's better for this or that software to be fast than correct.

Rust is really good for correctness. Take "Hello, World", the obvious toy program. Someone tried giving it various error states instead of (as would be usual) a normal happy terminal environment. In C or C++ the canonical "Hello, World" program terminates successfully despite any amount of errors, it just doesn't care about correctness.

The default Rust Hello World, the one you get out of the box when you make a new project, or you'd show people on a "My First Rust Program" course, will complain about the errors when they happen. Because doing so is correct.

It's the New Jersey style. The priority for these languages was simplicity of implementation. It's more important that you can cobble together a C compiler easily than that the results are useful or worthwhile. This contributed to C's survival, but we pay the price until we give it up.

[+] _flux|1 year ago|reply

Is it completely unwarranted, though? It seems most of the issues listed here are indeed memory safety bugs that are more difficult to pull off in memory-safe languages such as Rust: https://www.cvedetails.com/vulnerability-list/vendor_id-2153...

[+] itishappy|1 year ago|reply

> I struggle to understand why they would say that as the opening statement in such a matter-of-fact manner.

TFA's second sentence explains the facts of the matter along with the flavor of Kool-Aid they stock:

> The CA software itself is written in memory safe Golang, but from our server operating systems to our network equipment, lack of memory safety routinely leads to vulnerabilities that need patching.

[+] kelnos|1 year ago|reply

> I struggle to understand why they would say that as the opening statement in such a matter-of-fact manner.

Because it is a fact. An obvious, obvious fact to anyone who has been working in this ecosystem for any amount of time.

> Drinking the Rust kool-aid by the sound of it.

Would you say the same thing if they'd instead decided to use golang, zig, nim, ocaml, etc.? If not, maybe consider that your emotional stance around Rust is coloring your judgment.

Our modern systems are built on a house of cards where security is concerned. I agree that the "RiiR" meme is tiresome and dumb, but I'm tired of seeing report after report of new vulnerabilities found in foundational libraries and programs. The majority of those vulnerabilities are of the type that Rust won't even let you compile. Languages like C and C++ have their place, but for most applications, there are safer alternatives that don't require harsh compromises or significant trade offs.

[+] pjmlp|1 year ago|reply

Because since the Morris Worm in 1988, there are still plenty of networking facing services that keep being written in C and C++, and without the necessary sanitary precautions.

True, there are plenty of alternatives for many of those networking services, not necessarily Rust.

[+] hot_gril|1 year ago|reply

What's incorrect about that statement?

[+] bakugo|1 year ago|reply

[deleted]

[+] johnklos|1 year ago|reply

[deleted]

[+] dequan|1 year ago|reply

I agree that it would be great if the ecosystem was a bit slower to use every new version and it does seem like things are beginning to tend in that direction as many foundational crates have begun declaring MSRVs of !LATEST.

However I don't think the pace of updates really changes anything in terms of tool chain security. If Rust decided to go to a 36 week release cycle, each release would just have 6x as much stuff in it. If you can't keep up reviewing N changes in a 6 week release cycle, moving to a 6*X release cycle will not help you review N*X changes.

[+] unknown|1 year ago|reply

[deleted]

[+] hoseja|1 year ago|reply

Free pair of knee-high socks with every cert.

163 comments