I highly recommend anyone to look up how PTP works and how it compares to NTP. Clock sync is very interesting. When I joined an HFT company, first thing I did was understand this stuff. We care about it a lot[1].
If you want a specific question to answer, answer this: why does PTP need hardware timestamping to achieve high precision (where the network card itself assigns timestamps to packets, rather than having the kernel do it as part of TCP/IP processing)? If we use software timestamps, why can we do microsecond precision at best? If you understand this, it goes a very long way to understanding the core ideas behind precise clock sync.
Once you have a solid understanding of PTP, look into White Rabbit. They’re able to sync two clocks with sub-ns precision. In case that isn’t obvious, that is absolutely insane.
[1] So do a lot of people. For example audio engineers. Once, an audio engineer absolutely talked my ear off about ptp. I had no idea that audio people understood clock sync so well but they do!
> So do a lot of people. For example audio engineers.
Indeed. PTP (various, not-necessarily compatible, versions) is at the core of modern ethernet-based audio networking: Dante (proprietary, PTP: IEEE 1588 v1), AVB (IEEE standard, PTP: 802.1AS), AES67 (AES standard, PTP: IEEE 1588 v2). And now the scope of the AVB protocol stack has been expanded to TSN for industrial and automotive time sensitive network applications.
I find time accuracy to be ridiculously interesting, and I have had to talk myself out of buying those a used atomic clock to play with [1]. I think precision time is very cool, and a small part of me wants to create the most overly engineered wall-clock using a Raspberry Pi or something to have sub-microsecond level accuracy.
Sadly, they're generally just a bit too expensive for me to justify it as a toy.
I don't work in trading (though not for lack of trying on my end), so most of the stuff I work on has been a lot more about "logical clocks", which are cool in their own right, but I have always wondered how much more efficient we could be if we had nanosecond-level precision to guarantee that locks are almost always uncontested.
[1] I'm not talking about those clocks that radio to Colorado or Greenwich, I mean the relatively small ones that you can buy that run locally.
> When two transactions happen at nearly the same time on different nodes, the database must determine which happened first. If clocks are out of sync, the database might order them incorrectly, violating consistency guarantees.
This is only true if you use wall clock time as part of your database’s consistency algorithm. Generally I think this is a huge mistake. It’s almost always much easier to swap to a logical clock - which doesn’t care about wall time. And then you don’t have to worry about ntp.
The basic idea is this: event A happened before event B iff A (or something that happened after A) was observed by the node that generated B before B was generated. As a result, you end up with a dag of events - kind of like git. Some events aren’t ordered relative to one another. (We say, they happened concurrently). If you ever need a global order for all events, you can deterministically pick an arbitrary order for concurrent events by comparing ids or something. And this will give you a total order that will be the same on all peers.
If you make database events work like this, time is a little more complex. (It’s a graph traversal rather than simple numbers). But as a result the system clock doesn’t matter. No need to worry about atomic clocks, skew, drift, monotonicity, and all of that junk. It massively simplifies your system design.
Also I still remember having fun with the "Determine the order of events by saving a tuple containing monotonic time and a strictly monotonically increasing integer as follows" part.
Unfortunately, some of us have to deal with things like billing, transaction timing to validate what a client's logs might have on their systems, and so on.
My take on this is that second timing is close enough for this. And that all my internal systems need agree on the time. So if I'm off by 200ms or some blather from the rest of the world, I'm not overly concerned. I am concerned, however, if a random internal system is not synced to my own ntp servers.
This doesn't mean I don't keep our servers synced, just that being off by some manner of ms doesn't bother me inordinately. And when it comes to timing of events, yes, auto-increment IDs or some such are easier to deal with.
I wouldn't say it's a mistake. Distributed algorithms that depend on wall clock time generally give better guarantees. Usually you want these guarantees. The downside is of course you need to keep accurate time. In the cases you don't need them (eg. for the case you described), sure, but as an engineer you don't always get to choose your constraints.
On the flipside, clock sync for civilians has never been easier. Thanks to NTP any device with an Internet connection can pretty easily get time accurate to 1 second, often as little as 10 ms. All major consumer computers are preconfigured to sync time to one of several reliable NTP pools.
This post is about more complicated synchronization for more demanding applications. And it's very good. I'm just marveling at how in my lifetime I from "no clock is ever set right" to assuming most anything was within a second of true time.
I was doing something at work that involved calculating round trip times from/to Android devices, and learned that although it should be possible for NTP to sync clocks with below-second precision, in practice many of the Android devices I was working with (mostly Pixels 2-7) were off from my server and each other by up to 5 seconds, which blew my mind.
I don't think civilian clock synchronization was an issue since a long time ago.
DCF77 and WWVB has been around for more than 50 years. You could use some cheap electronics and get well below millisecond accuracy. GPS has been fully operational for 30 years, but it needs more expensive device.
I suspect you could even get below 1 sec accuracy using a watch with a hacking movement and listening to radio broadcast of time beeps / pips.
At this point the only clock in my life that doesn't auto set is the one on my stove, and that's because I abhor internet connected kitchen appliances.
The article doesn't cover the inane stupid that is:
* NTP pool server usage requires using DNS
* people have DNSSEC setup, which requires accurate time or it fails
So if your clock is off, you cannot lookup NTP pool servers via DNS, and therefore cannot set your clock.
This sheer stupid has been discussed with package maintainers of major distros, with ntpsec, and the result is a mere shrug. Often, the answer is "but doesn't your device have a battery backed clock?", which is quite unhelpful. Many devices (routers, IOT devices, small boards, or older machines, etc) don't have a battery backed clock, or alternatively the battery may just have died.
Beyond that, the ntpsec codebase has a horrible bug where if DNS is not available when ntpsec starts, pool server addresses are never, ever retried. So if you have a complete power-fail in a datacentre rack, and your firewalls take a little longer to boot than your machines, you'll have to manually restart ntpsec to even get it to ever sync.
When discussing this bug the ntpsec lads were confused that DNS might not exist at times.
Long story short, make sure you aren't using DNS in any capacity, in NTP configs, and most especially in ntpsec configs.
One good source is just using the IPs provided by NIST. Pool servers may seem fine, but I'd trust IPs assigned to NIST to exist longer than any DNS anyhow. EG, for decades.
I wouldn't say it's a 'nightmare'. It's just more complicated than what regular folk think computers work when it comes to time sync. There's nothing nightmareish or scary about this, it's just using the best solution for your scenario, understanding limitations and adjusting expectations/requirements accordingly, perhaps relaxing consistency requirements.
I worked on the NTP infra for a very large organization some time ago and the starriest thing I found was just how bad some of the clocks were on 'commodity hardware' but this just added a new parameter for triaging hardware for manufacturer replacement.
This is an ok article but it's just so very superficial. It goes too wide for such a deep subject matter.
Maybe. But I remember one game developer told that they face even a more challenging problem, which is the synchronization between players in multiplayer real-time games. Just imagine different users having significantly different network latencies in a multiplayer shooter where a couple milliseconds can be decisive. Someone makes a headshot when the game state is already outdated. If you think about this you can appreciate how it's complicated just to make the gameplay at least not awful...
I took to distributed systems like a duck to water. It was only much later that I figured out that while there are things I can figure out in one minute that took other people five, there were a lot of others that you will have to walk them through step by step or they would never get there. That really explained some interactions I’d had when I was younger.
In particular I don’t think the intuitions necessary to do distributed computing well would come to someone who snoozed through physics, who never took intro to computer engineering.
Back when I was studying computer science, I was taking the OS exam and the part about Lamport timestamp [0] was optional, but I had studied it because I loved it. When I mentioned it to my professor, he was so happy to hear something new that day that he asked me to describe it in details. This was the year 2001.
Many years later, in 2020, I ended up living in San Francisco, and I had the fortune to meet Leslie Lamport after I sent him a cold email. Lovely and smart guy. This is the text of the first part of that email, just for your curiosity:
Hey Leslie!
You have accompanied me for more than 20 years. I first met your name when studying Lamport timestamps.
And then on, and on, and on, up to a few minutes ago, when I realized that you are also behind the paper and the title of "Byzantine Generals problem", renamed after the "Albanian" generals to the suggestion of Jack Goldberg. Who is he? [1]
Ok,so people use NTP to "synchronize" their clocks and then write applications that assume the clocks are in exact sync and can use timestamps for synchronization, even though NTP can see the clocks aren't always in sync. Do I have that right?
If you are an engineer at Google dealing with Spanner, then you can in fact assume clocks are well synchronized and can use timestamps for synchronization. If you get commit timestamps from Spanner you can compare them to determine exactly which commit happened first. That’s a stronger guarantee than the typical Serializable database like postgresql: https://www.postgresql.org/docs/current/transaction-iso.html...
That’s the radical developer simplicity promised by TrueTime mentioned in the article.
Depending on the application you would generally use PTP to get sub-microsecond accuracy. The real trick is that architecture should tolerate various clocks starting or jumping out of sync and self correct.
This is a great breakdown, and it’s worth noting that we are hitting a "microsecond wall" in modern GPU clusters that makes standard NTP effectively obsolete.
In distributed training (LLMs), the bottleneck is no longer just disk I/O or CPU cycles—it’s the "straggler problem" during collective communication (like All-Reduce). When you’re running on 400Gbps+ RoCE (RDMA over Converged Ethernet) networks, the network "wire time" is often lower than the clock jitter on a standard Linux kernel.
If your clocks are skewed by even 2-3 milliseconds, your telemetry becomes essentially useless. It looks like packets are arriving before they were sent, or worse, your profiling tools can’t accurately pinpoint which GPU is stalling the rest of the 16,384-node fleet. We’ve reached a point where microsecond-accurate clocks isn't just a requirement for HFT firms; it’s becoming the baseline for anyone trying to keep $100s of millions of NVidia GPUs from idling while they wait for a collective sync.
If you have network infrastructure that supports 400G I'm pretty sure it has solid PTP built in. And as far as I remember from my networking days setting it up is almost as simple as setting up NTP, you just need a single machine with a GPS lock.
Unfortunate that the author doesn’t bring up FoundationDB version stamps, which to me feel like the right solution to the problem. Essentially, you can write a value you can’t read until after the transaction is committed and the synchronization infrastructure guarantees that value ends up being monotonically increasing per transaction. They use similar “write only” operations for atomic operations like increment.
Yes. A consistent total ordering is what you need (want) in distributed computing. Ultimately, causality is what is important, but consistent ordering of concurrent operations makes things much easier to work with.
The key here is a singleton sequencer component that stamps the new versions. There was a great article shared here on similar techniques used in trading order books (https://news.ycombinator.com/item?id=46192181).
Agree this is the best solution, I’d rather have a tiny failover period than risk serialization issues. Working with FDB has been such a joy because it’s serializable it takes away an entire class of error to consider, leading to simpler implementation.
One thing missing in the blogpost is in practice you see many large orgs, especially in finance, living with multiple time domains. For example, on-prem trading systems almost always use PTP or PPS for sub-microsecond timestamping, often on dedicated networks to reduce jitter (for meeting regulatory requirements like MiFID II and CAT) while the rest of their infra (in on-prem and cloud) just runs NTP for millisecond-class sync. Both protocols are fundamentally sensitive to network conditions — the mean offset may look fine, but outliers due to congestion/jitter can be very poor.
The consequence of having multiple time domains is pretty painful when you need to reconcile logs or transaction histories across systems with different sync accuracy. Millisecond NTP logs and sub-microsecond PTP logs don’t line up cleanly, so correlating events end-to-end can become guesswork rather than deterministic ordering.
If you want reliable cross-system telemetry and audit trails, you'll need a single, high-accuracy time sync approach across your whole stack.
The comments about HFT needing tightly synchronized clocks got me thinking.
Back in the day, way back in the 80's, IBM replaced the VM with VMXA. VM could trap and emulate all the important instructions since they were privileged instructions except one, the STCK (store clock) instruction. So virtual machines couldn't set their virtual clocks so they were always in sync. VMXA used new hw features that let you set the virtual clock. You could specify an offset to the system clock. But some of IBM's biggest customers depended on all the virtual machines clocks always being in sync. So VMXA had to add an option to disallow setting the clock for specified virtual machines.
Except all of development knew how trivial it was to trap or modify the STCK's to produce a timestamp of you choosing. This was before it was common knowledge the client code should never be trusted. But nobody enlightened IBM corporate management. It was a serious career limiting move at IBM. It didn't matter if you were right. So I'm pretty sure some serious fortunes were made as a result.
So the question for HFT is; are they using and trusting client timestamps, or are the timestamps being generated on the market maker's servers? If the latter, how would the customer know?
For an article written about time, I would have thought there'd be a timestamp on the blog post. Just something to think about if someone stumbles upon this in a few years.
> The good news is that the International Bureau of Weights and Measures has decided to stop adding leap seconds by 2035.
This is not entirely correct. What has been agreed is to allow deviations of more than one second after 2035, so that clocks have to be adjusted less frequently (on the order of every 50-100 years is the intention). However, the allowable deviation, and how to adjust clocks when it is exceeded, has yet to be decided.
A very clever part of the HUYGENS algorithm is that it doesn’t just sync clocks pair-wise, it leverages a natural network effect where a group of pair-wise synchronized clocks becomes transitively synchronized, helping reduce errors further without requiring specialized hardware. That’s one of the key reasons it can achieve ~100 nanoseconds of software-based sync on commodity networks.
The authors’ work forms the basis of what the team at Clockwork.io is building, enabling accurate one-way delay measurements (rather than just RTT/2) that improve latency visibility and telemetry across CPU and GPU infrastructure
AWS has the Google TrueTime equivalent precision clock available for public use[1] which makes this problem much easier to solve now. Auora DSQL uses it. Even third party db's like YugabyteDb make use of it.
Timesync isn’t a nightmare at all. But it is a deep rabbit hole.
The best approach, imho, is to abandon the concept of a global time. All timestamps are wrt a specific clock. That clock will skew at a rate that varies with time. You can, hopefully, rely on any particular clock being monotonous!
My mental model is that you form a connected graph of clocks and this allows you to convert arbitrary timestamps from any clock to any clock. This is a lossy conversion that has jitter and can change with time. The fewer stops the better.
I kinda don’t like PTP. Too complicated and requires specialized hardware.
This article only touches on one class of timesync. An entirely separate class is timesync within a device. Your phone is a highly distributed compute system with many chips each of which has their own independent clock source. It’s a pain in the ass.
You also have local timesync across devices such as wearables or robotics. Connecting to a PTP system with GPS and atomic clocks is not ideal (or necessary).
> I kinda don’t like PTP. Too complicated and requires specialized hardware.
At this stage, it's difficult to find an half-decent ethernet quality MAC that doesn't have PTP timestamping. It's not a particularly complicated protocol, either.
I needed to distribute PPS and 10MHz into a GNSS-denied environment, so last summer I designed a board to do this using 802.1AS gPTP with a uBlox LEA-M8T GNSS timing receiver, a 10MHz OCXO and an STM32F767 MCU. This took me about four weeks. Software is written in C, and the PTP implementation accounts for 1500 LOC.
> I kinda don’t like PTP. Too complicated and requires specialized hardware.
In my view the specialised hardware is just a way to get more accurate transmission and arrival timestamps. That's useful whether or not you use PTP.
> My mental model is that you form a connected graph of clocks and this allows you to convert arbitrary timestamps from any clock to any clock. This is a lossy conversion that has jitter and can change with time.
This sounds like the "peer to peer" equivalent to PTP. It would require every node to maintain state about it's estimate (skew, slew, variance) of every other clock. I like the concept, but obviously it adds complexity to end-stations beyond what PTP requires (i.e. increases the hardware cost of embedded implementations). Such a system would also need to model the network topology, or control routing (as PTP does), because packets traversing different routes to the same host will experience different delay and jitter statistics.
> TicSync is cool
I hadn't seen this before, but I have implemented similar convex-hull based methods for clock recovery. I agree this is obviously a good approach. Thanks for sharing.
As a user of WhiteRabbit, I can confirm a sub-10ps sync (two clocks phase lock) over 50km fiber connection for variable temperature of fiber (biggest problem of clock sync over fibers is temperature induced length change of the fiber itself, which needs to be measured and compensated).
Back in the early 2000s I was programming on an IBM AIX server. Multicore, maybe multiprocessor and within the same machine, the clocks were skewed between the processors. If you’d dispatch a process, and then check its outstanding running time, it would differ depending upon which processor you’d check from, and of course it was a signed type, and then we would get negative values, which sent our code down the wrong path.
Clock sync is such a nightmare in robotics. Most OSes happily will skew/jump to get the time correct. Time jumps (especially backwards) will crash most robotics stacks. You might decide to ensure that you have synced time before starting the stack. Great, now your timestamps are mostly accurate, except what happens when you've used GPS as your time source, and you start indoors? Robot hangs forever.
Hot take: I've seen this and enough other badly configured time sync settings that I want to ban system time from robotics systems - time from startup only! If you want to know what the real world time was for a piece of data after, write what your epoch is once you have a time sync, and add epoch+start time.
If your requirements are “must have accurate time, must start with an inaccurate time, must not step time during operation, no atomic clocks, must not require a network connection, or a WWVB signal, must work without a GPS signal” then yes, you need to relax your requirements.
But it doesn’t have to be the first requirement you relax.
Normally I would nod at the title. Having lived it.
But I just watched/listened to a Richard Feynmann talk on the nature of time and clocks and the futility of "synchronizing" clocks. So I'm chuckling a bit. In the general sense, I mean. Yes yes, for practical purposes in the same reference frame on earth, it's difficult but there's hope. Now, in general ... synchronizing two clocks is ... meaningless?
Feynman was not entirely sincere. The implosion of nuclear device requires precise synchronization of multiple detonations. Basically the more precisely you can trigger the less fissile material you need for the sphere. To the day high accuracy bridgewire/foil bridge designs remain on ITAR.
> But I just watched/listened to a Richard Feynmann talk on the nature of time
I hate to break it to you, but you were fooled by an AI dupe. Also took me a while to realise this. It’s sad we live in this tiring world where we have to fact check every single piece of content for authenticity. It’s just tiring. I’m sure many will reply it doesn’t matter, which of course will be funny to consider given someone went to the work of vocal cloning Feynman to make a channel of content (copyrighted of course) while claiming “no disrespect intended”.
Reminds me of the old saying: 'If you have just one watch/clock, then you always know what time it is; but if you have two of them, then you are never sure!'
Nature (laws of physics) is agains you on this: it is in fact impossible for everyone. What is in sync for some observers can be out of sync for others (depends on where they are, i.e. gravity, and how they relatively move). See general and special relativity principle of simultaneity [1].
PTP requires support not only on your network, but also on your peripheral bus and inside your CPU. It can't achieve better-than-NTP results without disabling PCI power saving features and deep CPU sleep states.
You can if you just run PTP (almost) entirely on your NIC. The best PTP implementations take their packet timestamps at the MAC on the NIC and keep time based on that. Nothing about CPU processing is time-critical in that case.
PTP does not require support on your network beyond standard ethernet packet forwarding when used in ethernet mode.
In multicast IP mode, with multiple switches, it requires what anything running multicast between switches/etc would require (IE some form of IGMP snopping or multicast routing or .....)
In unicast IP mode, it requires nothing from your network.
Therefore, i have no idea what it means to "require support on the network".
I have used both ethernet and multicast PTP across a complete mishmash of brands and types and medias of switches, computers, etc, with no issues.
The only thing that "support" might improve is more accurate path delay data through transparent clocks. If both master and slave do accurate hardware timestamping already, and the path between them is constant, it is easily possible to get +-50 nanoseconds without any transparent clock support.
Here is the stats from a random embedded device running PTP i just accessed a second ago:
Reference ID : 50545030 (PTP0)
Stratum : 1
Ref time (UTC) : Sun Dec 28 02:47:25 2025
System time : 0.000000029 seconds slow of NTP time
Last offset : -0.000000042 seconds
RMS offset : 0.000000034 seconds
Frequency : 8.110 ppm slow
Residual freq : -0.000 ppm
Skew : 0.003 ppm
So this embedded ARM device, which is not special in any way, is maintaining time +-35ns of the grandmaster, and currently 30ns of GPS time.
The card does not have an embedded hardware PTP clock, but it does do hardware timestamp and filtering.
This grandmaster is an RPI with an intel chipset on it and the PPS input pin being used to discipline the chipset's clock. It stays within +-2ns (usually +-1ns) of GPS time.
Obviously, holdover sucks, but not the point :)
This qualifies as better-than-NTP for sure, and this setup has no network support. No transparent clocks, etc. These machines have multiple media transitions involved (fiber->ethernet), etc.
The main thing transparent clock support provides in practice is dealing with highly variable delay. Either from mode of transport, number of packet processors in between your nodes, etc. Something that causes the delay to be hard to account for.
The ethernet packet processing in ethernet mode is being handled in hardware by the switches and basically all network cards. IP variants would probably be hardware assisted but not fully offloaded on all cards, and just ignored on switches (assuming they are not really routers in disguise).
The hardware timestamping is being done in the card (and the vast majority of ethernet cards have supported PTP harware timestamping for >1 decade at this point), and works perfectly fine with deep CPU sleep states.
Some don't do hardware filtering so they essentially are processing more packets that necessary but .....
In physics, time is local and relative, independent events don’t need a global ordering. Distributed databases shouldn’t require one either. The idea of a single global time comes from 1980s single-node database semantics, where serializability implied one universal execution order. When that model was lifted into distributed systems, researchers introduced global clocks and timestamp coordination to preserve those guarantees, not because distributed systems fundamentally need it. It’s time we rethink this., Only operations that touch the same piece of data require ordering. Everything else should follow causality like the physical universe, independent events don’t need to agree on sequence, only dependent ones do. Global clocks exist because some databases forced serializable cross-object transactions onto distributed systems, not because nature requires it.
Edit: I welcome for a discussion with people who disagree and downvote.
You can’t be certain that any given mutating operation you perform now won’t be relied upon for some future operation, unless the two operations are performed in entirely different domains of data. Even “not touching (by which I assume you mean mutating) the same data” isn’t enough. If I update A in thread 0 from 1 to 2, then I update B in thread 1 to the value of A+1, then the value of B could end up being 2 or 3, depending on whether the update of A reached thread 1.
> Google faced the clock synchronization problem at an unprecedented scale with Spanner, its globally distributed database. They needed strong consistency guarantees across data centers spanning continents, which requires knowing the order of transactions.
> Here’s a video of me explaining this.
Do you need a video? Do we need a 42 minute video to explain this?
I generally agree with Feynman on this stuff. We let explanations be far more complex than they need to be for most things, and it makes the hunt for accidental complexity harder because everything looks almost as complex as the problems that need more study to divine what is actually going on there.
For Spanner to be useful they needed a high transaction rate and in a distributed system that requires very tight grace periods for First Writer Wins. Tighter than you can achieve with NTP or system clocks. That’s it. That’s why they invented a new clock.
Google puts it this way:
Under external consistency, the system behaves as if all transactions run sequentially, even though Spanner actually runs them across multiple servers (and possibly in multiple datacenters) for higher performance and availability.
But that’s a bit thick for people who don’t spend weeks or years thinking about distributed systems.
dmazin|2 months ago
If you want a specific question to answer, answer this: why does PTP need hardware timestamping to achieve high precision (where the network card itself assigns timestamps to packets, rather than having the kernel do it as part of TCP/IP processing)? If we use software timestamps, why can we do microsecond precision at best? If you understand this, it goes a very long way to understanding the core ideas behind precise clock sync.
Once you have a solid understanding of PTP, look into White Rabbit. They’re able to sync two clocks with sub-ns precision. In case that isn’t obvious, that is absolutely insane.
[1] So do a lot of people. For example audio engineers. Once, an audio engineer absolutely talked my ear off about ptp. I had no idea that audio people understood clock sync so well but they do!
RossBencina|2 months ago
Indeed. PTP (various, not-necessarily compatible, versions) is at the core of modern ethernet-based audio networking: Dante (proprietary, PTP: IEEE 1588 v1), AVB (IEEE standard, PTP: 802.1AS), AES67 (AES standard, PTP: IEEE 1588 v2). And now the scope of the AVB protocol stack has been expanded to TSN for industrial and automotive time sensitive network applications.
baby_souffle|2 months ago
tombert|2 months ago
Sadly, they're generally just a bit too expensive for me to justify it as a toy.
I don't work in trading (though not for lack of trying on my end), so most of the stuff I work on has been a lot more about "logical clocks", which are cool in their own right, but I have always wondered how much more efficient we could be if we had nanosecond-level precision to guarantee that locks are almost always uncontested.
[1] I'm not talking about those clocks that radio to Colorado or Greenwich, I mean the relatively small ones that you can buy that run locally.
nickpsecurity|2 months ago
https://en.wikipedia.org/wiki/White_Rabbit_Project
josephg|2 months ago
This is only true if you use wall clock time as part of your database’s consistency algorithm. Generally I think this is a huge mistake. It’s almost always much easier to swap to a logical clock - which doesn’t care about wall time. And then you don’t have to worry about ntp.
The basic idea is this: event A happened before event B iff A (or something that happened after A) was observed by the node that generated B before B was generated. As a result, you end up with a dag of events - kind of like git. Some events aren’t ordered relative to one another. (We say, they happened concurrently). If you ever need a global order for all events, you can deterministically pick an arbitrary order for concurrent events by comparing ids or something. And this will give you a total order that will be the same on all peers.
If you make database events work like this, time is a little more complex. (It’s a graph traversal rather than simple numbers). But as a result the system clock doesn’t matter. No need to worry about atomic clocks, skew, drift, monotonicity, and all of that junk. It massively simplifies your system design.
johnisgood|2 months ago
Also I still remember having fun with the "Determine the order of events by saving a tuple containing monotonic time and a strictly monotonically increasing integer as follows" part.
b112|2 months ago
My take on this is that second timing is close enough for this. And that all my internal systems need agree on the time. So if I'm off by 200ms or some blather from the rest of the world, I'm not overly concerned. I am concerned, however, if a random internal system is not synced to my own ntp servers.
This doesn't mean I don't keep our servers synced, just that being off by some manner of ms doesn't bother me inordinately. And when it comes to timing of events, yes, auto-increment IDs or some such are easier to deal with.
hnfong|2 months ago
NelsonMinar|2 months ago
This post is about more complicated synchronization for more demanding applications. And it's very good. I'm just marveling at how in my lifetime I from "no clock is ever set right" to assuming most anything was within a second of true time.
Uehreka|2 months ago
raron|2 months ago
I don't think civilian clock synchronization was an issue since a long time ago.
DCF77 and WWVB has been around for more than 50 years. You could use some cheap electronics and get well below millisecond accuracy. GPS has been fully operational for 30 years, but it needs more expensive device.
I suspect you could even get below 1 sec accuracy using a watch with a hacking movement and listening to radio broadcast of time beeps / pips.
jasonwatkinspdx|2 months ago
b112|2 months ago
* NTP pool server usage requires using DNS
* people have DNSSEC setup, which requires accurate time or it fails
So if your clock is off, you cannot lookup NTP pool servers via DNS, and therefore cannot set your clock.
This sheer stupid has been discussed with package maintainers of major distros, with ntpsec, and the result is a mere shrug. Often, the answer is "but doesn't your device have a battery backed clock?", which is quite unhelpful. Many devices (routers, IOT devices, small boards, or older machines, etc) don't have a battery backed clock, or alternatively the battery may just have died.
Beyond that, the ntpsec codebase has a horrible bug where if DNS is not available when ntpsec starts, pool server addresses are never, ever retried. So if you have a complete power-fail in a datacentre rack, and your firewalls take a little longer to boot than your machines, you'll have to manually restart ntpsec to even get it to ever sync.
When discussing this bug the ntpsec lads were confused that DNS might not exist at times.
Long story short, make sure you aren't using DNS in any capacity, in NTP configs, and most especially in ntpsec configs.
One good source is just using the IPs provided by NIST. Pool servers may seem fine, but I'd trust IPs assigned to NIST to exist longer than any DNS anyhow. EG, for decades.
ectospheno|2 months ago
maximinus_thrax|2 months ago
I worked on the NTP infra for a very large organization some time ago and the starriest thing I found was just how bad some of the clocks were on 'commodity hardware' but this just added a new parameter for triaging hardware for manufacturer replacement.
This is an ok article but it's just so very superficial. It goes too wide for such a deep subject matter.
senfiaj|2 months ago
hinkley|2 months ago
In particular I don’t think the intuitions necessary to do distributed computing well would come to someone who snoozed through physics, who never took intro to computer engineering.
blibble|2 months ago
you buy the hardware, plug it all in, and it works
simonebrunozzi|2 months ago
Many years later, in 2020, I ended up living in San Francisco, and I had the fortune to meet Leslie Lamport after I sent him a cold email. Lovely and smart guy. This is the text of the first part of that email, just for your curiosity:
Hey Leslie!
You have accompanied me for more than 20 years. I first met your name when studying Lamport timestamps.
And then on, and on, and on, up to a few minutes ago, when I realized that you are also behind the paper and the title of "Byzantine Generals problem", renamed after the "Albanian" generals to the suggestion of Jack Goldberg. Who is he? [1]
[0]: https://en.wikipedia.org/wiki/Lamport_timestamp
[1]: Jack Goldberg (now retired) was a computer scientist and Lamport's manager at SRI.
j_seigh|2 months ago
kccqzy|2 months ago
That’s the radical developer simplicity promised by TrueTime mentioned in the article.
TrainedMonkey|2 months ago
goodpoint|2 months ago
sureshvoz|2 months ago
In distributed training (LLMs), the bottleneck is no longer just disk I/O or CPU cycles—it’s the "straggler problem" during collective communication (like All-Reduce). When you’re running on 400Gbps+ RoCE (RDMA over Converged Ethernet) networks, the network "wire time" is often lower than the clock jitter on a standard Linux kernel.
If your clocks are skewed by even 2-3 milliseconds, your telemetry becomes essentially useless. It looks like packets are arriving before they were sent, or worse, your profiling tools can’t accurately pinpoint which GPU is stalling the rest of the 16,384-node fleet. We’ve reached a point where microsecond-accurate clocks isn't just a requirement for HFT firms; it’s becoming the baseline for anyone trying to keep $100s of millions of NVidia GPUs from idling while they wait for a collective sync.
perryizgr8|2 months ago
georgelyon|2 months ago
lll-o-lll|2 months ago
Fizzadar|2 months ago
Agree this is the best solution, I’d rather have a tiny failover period than risk serialization issues. Working with FDB has been such a joy because it’s serializable it takes away an entire class of error to consider, leading to simpler implementation.
danzheng|2 months ago
The consequence of having multiple time domains is pretty painful when you need to reconcile logs or transaction histories across systems with different sync accuracy. Millisecond NTP logs and sub-microsecond PTP logs don’t line up cleanly, so correlating events end-to-end can become guesswork rather than deterministic ordering.
If you want reliable cross-system telemetry and audit trails, you'll need a single, high-accuracy time sync approach across your whole stack.
kobieps|2 months ago
Dylan16807|2 months ago
j_seigh|2 months ago
Back in the day, way back in the 80's, IBM replaced the VM with VMXA. VM could trap and emulate all the important instructions since they were privileged instructions except one, the STCK (store clock) instruction. So virtual machines couldn't set their virtual clocks so they were always in sync. VMXA used new hw features that let you set the virtual clock. You could specify an offset to the system clock. But some of IBM's biggest customers depended on all the virtual machines clocks always being in sync. So VMXA had to add an option to disallow setting the clock for specified virtual machines.
Except all of development knew how trivial it was to trap or modify the STCK's to produce a timestamp of you choosing. This was before it was common knowledge the client code should never be trusted. But nobody enlightened IBM corporate management. It was a serious career limiting move at IBM. It didn't matter if you were right. So I'm pretty sure some serious fortunes were made as a result.
So the question for HFT is; are they using and trusting client timestamps, or are the timestamps being generated on the market maker's servers? If the latter, how would the customer know?
eatsome|2 months ago
layer8|2 months ago
This is not entirely correct. What has been agreed is to allow deviations of more than one second after 2035, so that clocks have to be adjusted less frequently (on the order of every 50-100 years is the intention). However, the allowable deviation, and how to adjust clocks when it is exceeded, has yet to be decided.
koudelka|2 months ago
https://www.usenix.org/system/files/conference/nsdi18/nsdi18...
danzheng|2 months ago
The authors’ work forms the basis of what the team at Clockwork.io is building, enabling accurate one-way delay measurements (rather than just RTT/2) that improve latency visibility and telemetry across CPU and GPU infrastructure
pdeva1|2 months ago
[1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time...
forrestthewoods|2 months ago
The best approach, imho, is to abandon the concept of a global time. All timestamps are wrt a specific clock. That clock will skew at a rate that varies with time. You can, hopefully, rely on any particular clock being monotonous!
My mental model is that you form a connected graph of clocks and this allows you to convert arbitrary timestamps from any clock to any clock. This is a lossy conversion that has jitter and can change with time. The fewer stops the better.
I kinda don’t like PTP. Too complicated and requires specialized hardware.
This article only touches on one class of timesync. An entirely separate class is timesync within a device. Your phone is a highly distributed compute system with many chips each of which has their own independent clock source. It’s a pain in the ass.
You also have local timesync across devices such as wearables or robotics. Connecting to a PTP system with GPS and atomic clocks is not ideal (or necessary).
TicSync is cool and useful. https://sci-hub.se/10.1109/icra.2011.5980112
bigfatkitten|2 months ago
At this stage, it's difficult to find an half-decent ethernet quality MAC that doesn't have PTP timestamping. It's not a particularly complicated protocol, either.
I needed to distribute PPS and 10MHz into a GNSS-denied environment, so last summer I designed a board to do this using 802.1AS gPTP with a uBlox LEA-M8T GNSS timing receiver, a 10MHz OCXO and an STM32F767 MCU. This took me about four weeks. Software is written in C, and the PTP implementation accounts for 1500 LOC.
DannyBee|2 months ago
?????
I run PTP on everything from RPI's to you name it, over fiber, ethernet, etc.
The main thing hardware gives is filtration of PTP packets or hardware timestamping.
Neither is actually required, though some software has decided to require it.
Additionally, something like 99% of sold gigabit or better chipsets since 2012 support it (I210 et al)
RossBencina|2 months ago
In my view the specialised hardware is just a way to get more accurate transmission and arrival timestamps. That's useful whether or not you use PTP.
> My mental model is that you form a connected graph of clocks and this allows you to convert arbitrary timestamps from any clock to any clock. This is a lossy conversion that has jitter and can change with time.
This sounds like the "peer to peer" equivalent to PTP. It would require every node to maintain state about it's estimate (skew, slew, variance) of every other clock. I like the concept, but obviously it adds complexity to end-stations beyond what PTP requires (i.e. increases the hardware cost of embedded implementations). Such a system would also need to model the network topology, or control routing (as PTP does), because packets traversing different routes to the same host will experience different delay and jitter statistics.
> TicSync is cool
I hadn't seen this before, but I have implemented similar convex-hull based methods for clock recovery. I agree this is obviously a good approach. Thanks for sharing.
mgaunard|2 months ago
A regular pulse is emitted from a specialized high-precision device, possibly over a specialized high-precision network.
Enables picosecond accuracy (or at least sub-nano).
nuccy|2 months ago
jmpman|2 months ago
gdcohen|2 months ago
Asmod4n|2 months ago
tbrownaw|2 months ago
amiune|2 months ago
As a teacher I love the way Judah Levine explains
a_t48|2 months ago
Hot take: I've seen this and enough other badly configured time sync settings that I want to ban system time from robotics systems - time from startup only! If you want to know what the real world time was for a piece of data after, write what your epoch is once you have a time sync, and add epoch+start time.
michaelt|2 months ago
But it doesn’t have to be the first requirement you relax.
RossBencina|2 months ago
awesome_dude|2 months ago
emptybits|2 months ago
But I just watched/listened to a Richard Feynmann talk on the nature of time and clocks and the futility of "synchronizing" clocks. So I'm chuckling a bit. In the general sense, I mean. Yes yes, for practical purposes in the same reference frame on earth, it's difficult but there's hope. Now, in general ... synchronizing two clocks is ... meaningless?
https://www.youtube.com/watch?v=zUHtlXA1f-w
hinkley|2 months ago
varjag|2 months ago
unknown|2 months ago
[deleted]
m463|2 months ago
glopesdev|2 months ago
I hate to break it to you, but you were fooled by an AI dupe. Also took me a while to realise this. It’s sad we live in this tiring world where we have to fact check every single piece of content for authenticity. It’s just tiring. I’m sure many will reply it doesn’t matter, which of course will be funny to consider given someone went to the work of vocal cloning Feynman to make a channel of content (copyrighted of course) while claiming “no disrespect intended”.
hinkley|2 months ago
didgetmaster|2 months ago
yapyap|2 months ago
user3939382|2 months ago
shomp|2 months ago
nuccy|2 months ago
1. https://en.wikipedia.org/wiki/Relativity_of_simultaneity
jeffbee|2 months ago
pclmulqdq|2 months ago
rcxdude|2 months ago
DannyBee|2 months ago
In multicast IP mode, with multiple switches, it requires what anything running multicast between switches/etc would require (IE some form of IGMP snopping or multicast routing or .....)
In unicast IP mode, it requires nothing from your network.
Therefore, i have no idea what it means to "require support on the network".
I have used both ethernet and multicast PTP across a complete mishmash of brands and types and medias of switches, computers, etc, with no issues.
The only thing that "support" might improve is more accurate path delay data through transparent clocks. If both master and slave do accurate hardware timestamping already, and the path between them is constant, it is easily possible to get +-50 nanoseconds without any transparent clock support.
Here is the stats from a random embedded device running PTP i just accessed a second ago:
So this embedded ARM device, which is not special in any way, is maintaining time +-35ns of the grandmaster, and currently 30ns of GPS time.The card does not have an embedded hardware PTP clock, but it does do hardware timestamp and filtering.
This grandmaster is an RPI with an intel chipset on it and the PPS input pin being used to discipline the chipset's clock. It stays within +-2ns (usually +-1ns) of GPS time.
Obviously, holdover sucks, but not the point :)
This qualifies as better-than-NTP for sure, and this setup has no network support. No transparent clocks, etc. These machines have multiple media transitions involved (fiber->ethernet), etc.
The main thing transparent clock support provides in practice is dealing with highly variable delay. Either from mode of transport, number of packet processors in between your nodes, etc. Something that causes the delay to be hard to account for.
The ethernet packet processing in ethernet mode is being handled in hardware by the switches and basically all network cards. IP variants would probably be hardware assisted but not fully offloaded on all cards, and just ignored on switches (assuming they are not really routers in disguise).
The hardware timestamping is being done in the card (and the vast majority of ethernet cards have supported PTP harware timestamping for >1 decade at this point), and works perfectly fine with deep CPU sleep states.
Some don't do hardware filtering so they essentially are processing more packets that necessary but .....
sreekanth850|2 months ago
otterley|2 months ago
hinkley|2 months ago
> Here’s a video of me explaining this.
Do you need a video? Do we need a 42 minute video to explain this?
I generally agree with Feynman on this stuff. We let explanations be far more complex than they need to be for most things, and it makes the hunt for accidental complexity harder because everything looks almost as complex as the problems that need more study to divine what is actually going on there.
For Spanner to be useful they needed a high transaction rate and in a distributed system that requires very tight grace periods for First Writer Wins. Tighter than you can achieve with NTP or system clocks. That’s it. That’s why they invented a new clock.
Google puts it this way:
Under external consistency, the system behaves as if all transactions run sequentially, even though Spanner actually runs them across multiple servers (and possibly in multiple datacenters) for higher performance and availability.
But that’s a bit thick for people who don’t spend weeks or years thinking about distributed systems.