Splitting the Ping

[+] great_wubwub|5 years ago|reply

Everybody is talking about clock accuracy and totally missing that devices in the middle of the network path do not care about responding quickly to pings. Middle devices are generally routers or firewalls, and their job is to route and firewall, not to respond to a packet as quickly as it comes in. Transit traffic is far more important than processing control plane packets. Devices can add several msec or more in latency by sticking ICMP echo requests and the like in a low-priority queue and getting around to responding eventually. This will dwarf any gains produced by "the best NTP server".

And no, setting QoS bits on the packet will not help.

[+] walrus01|5 years ago|reply

Absolutely this. One of the first thing that an ISP's NOC will tell a business customer, when they're complaining about ICMP loss or high latency to some intermediate-hop seen in a traceroute, is to test against the ICMP loss and latency/jitter to an endpoint destination.

Preferably to an endpoint destination that is some sort of server which isn't firewalling/rate limiting answering ICMP, and not a firewall or router.

You can see what looks like terrible loss and jitter to intermediate hops in a traceroute, but traffic to your your VoIP server that is 12.3ms away on the far side of those hops and several AS-to-AS adjacencies, might average 0.00% loss over multi week periods, with less than 0.3ms of jitter range.

Customers will also be advised to use other tools that can measure success/failure and RTT answer time of other services and daemons running on the endpoint they're testing against (at OSI layers 4-7), such as the various ways of measuring DNS lookup query time, or time to curl a sample file from an httpd over TLS1.2/1.3, and plot that on a multi day/week chart.

For their own protection routers and other network equipment with packets flowing through them deprioritize answering ICMP.

[+] bogomipz|5 years ago|reply

>"Everybody is talking about clock accuracy and totally missing that devices in the middle of the network path do not care about responding quickly to pings. Middle devices are generally routers or firewalls, and their job is to route and firewall, not to respond to a packet as quickly as it comes in."

That's not correct. A box in the middle of the network path by definition doesn't respond at all to the ping request since it's not addressed to them. It's simply forwards the IP packet that encapsulates the ICMP echo request towards its destination. As forwarding happens in the data plane this doesn't involve the control plane at all. A router with no QoS will forward IP datagrams at the same rate whether they encapsulate TCP, UDP or ICMP.

[+] tyingq|5 years ago|reply

Wouldn't some ICMP messages be important to send back quickly? Like "Fragmentation Needed"?

[+] rwmj|5 years ago|reply

Do they actually do this? It sounds like it would be more effort for a device to deeply inspect the packet and sort them into queues (and to what end exactly?), rather than simply forward as fast as possible based on the destination address.

[+] yencabulator|5 years ago|reply

Oh, it's even worse. I've observed bottlenecked ISPs prioritizing ping, where ICMP Echo latency and packet loss are less than the same for TCP & UDP packets (and I mean packets not inside the TCP stream). I guess the squeaky wheel got the grease, and that stopped people using ping results to complain about their slow internet connection.

[+] pxx|5 years ago|reply

Not entirely related, but a fun and interesting tangent: There's actually no way that we know of to measure the "one-way" speed of light, as the specific synchronization that you use (and this post uses to do its calculation) assumes that the speed of light is the same in both directions. For all we know, light travels infinitely quickly in one direction and at c/2 in the return direction.

https://en.wikipedia.org/wiki/One-way_speed_of_light

recent-ish video about this: https://www.youtube.com/watch?v=pTn6Ewhb27k

[+] lmm|5 years ago|reply

Isn't that a distinction without a difference? You might as well say that light travels infinitely quickly in the other direction and at c/2 in the first direction - there's no measurable difference between the two cases.

There's no observable anisotropy to spacetime, so I think it makes the most sense to treat the speed of light as the same in all directions.

[+] bitsen|5 years ago|reply

That’s the stupidest argument I’ve ever heard from someone attempting to be a physicist.

The gear experiment is the simplest answer. With a gear whose diameter spans the distance between the light source and receiver, and where the speed of rotation is controlled by an atomic clock, the gear could have a small hole through it. If light sent from one side to the other while the gear is spinning is too slow, it will not make it through the gear.

The size of the bike through the gear will dictate the speed, because the gear itself measures duration and distance in a single direction.

[+] Ono-Sendai|5 years ago|reply

It's basically impossible to measure the one-way latency, without external information in the form of clock synchronisation done externally. See http://twistedoakstudios.com/blog/Post2353_when-one-way-late...

This fact is also the basis for special relativity (different observers may choose different simultaneity conventions - for example Einstein synchronisation)

[+] datastoat|5 years ago|reply

To be precise, it's impossible to measure one-way latency between a pair of nodes without external information. But if you have a mesh of nodes, the story is different [1]. If you have N nodes and hence N unknown clock offsets, and if you have ping data from N(N-1) pairs, you can do better.

[1] https://www.usenix.org/conference/nsdi18/presentation/geng

[+] bentcorner|5 years ago|reply

You should be able to tell when one side of the link degrades though, correct? You'd have known time deltas for the link each way (whether its accurate doesn't really matter) and when the round trip time changes you should be able to tell which side that occurred on. (I'm interested in this problem because my cable internet reliably (!) degrades during the workday)

[+] nullserver|5 years ago|reply

https://web.mit.edu/jemorris/humor/500-miles

Email can’t go father then 500 miles

[+] datastoat|5 years ago|reply

As the article explains, latency and clock sync go hand in hand. Here's a blog post [1] that goes further into clock sync, contrasting NTP and hardware-based systems. The company behind the blog post says that their solution is available as a managed service on Azure and GCP, but I've never looked out for it.

[1] https://www.ticktocknetworks.com/tick-tock-the-clock-runs-wi...

[+] jlgaddis|5 years ago|reply

If you've got clock synchronization between two hosts, there's OWAMP, a.k.a. RFC4656 [0].

The Minimum-Pairs Protocol [1] eliminates the need for clock synchronization but requires (at least) 3 hosts under your control, plus the "uncooperative" fourth node:

> The minimum-pairs (or MP) is an active measurement protocol to estimate in real-time the smaller of the forward and reverse one-way network delays (OWDs).[1] It is designed to work in hostile environments, where a set of three network nodes can estimate an upper-bound OWDs between themselves and a fourth untrusted node. All four nodes must cooperate, though honest cooperation from the fourth node is not required. The objective is to conduct such estimates without involving the untrusted nodes in clock synchronization, and in a manner more accurate than simply half the Round-Trip Time (RTT).

--

[0]: https://tools.ietf.org/html/rfc4656

[1]: https://en.wikipedia.org/wiki/Minimum-Pairs_Protocol

[+] jedimastert|5 years ago|reply

An excellent video about why it's impossible to measure the speed of light in one direction

https://www.youtube.com/watch?v=pTn6Ewhb27k

[+] unknown|5 years ago|reply

[deleted]

[+] IgorPartola|5 years ago|reply

I thought NTP specifically did account for transit latency. Not sure why I made that assumption, but if it’s not true, how can I ever trust my clock to be correct?

[+] detaro|5 years ago|reply

As the article says, it does account for it. It just can't account for unknown constant asymmetric latency.

> how can I ever trust my clock to be correct?

what does it mean for your clock to be "correct"? If you need your time to be precise to more than a few 100ms you probably shouldn't be getting it from random NTP servers over unknown connections, but for most people that's an acceptable error.

[+] swinglock|5 years ago|reply

This protocol and software exists, it's called OWAMP.

[+] chlbny|5 years ago|reply

OWAMP is even a standardized protocol (RFC 4656).

For the record, there is another tool that can perform such kind of one-way measurement called https://github.com/heistp/irtt .

[+] statstutor|5 years ago|reply

The article says "Now that we have a synced clock", but surely the clock synchronisation would also be unable to resolve any asymmetry in the ping and the synchronisation would inevitably be impacted by this. (Or, does NTP have a solution to this?)

[Edited to add: from https://en.wikipedia.org/wiki/Network_Time_Protocol: "Asymmetric routes... can cause errors of 100 ms or more."]

[+] jtsiskin|5 years ago|reply

Yes, NTP clock sync assumes symmetric delays, so you have the exact same problem.

However they use the closest NTP time server, so assuming the NTP server owners correctly are able to sync their own servers, the offset is probably close.

Unless the asymmetric delay is on the path which both the ping and the NTP synchronization take

[+] benjojo12|5 years ago|reply

The point was to say, if you are in a DC environment and have ~1ms access to a Stratum 1 NTP server, then you can use that, if you are on DSL/DOCSIS etc, then it's likely required to use GNSS/PPS sources

[+] dec0dedab0de|5 years ago|reply

This reminded me of the buffer bloat[0] problem esr was talking about a few years ago.

I thought I read a paper where he was getting hardware made for the purpose of using gps for time so it could be accurate enough to measure latency. I just googled and I realized it wasn't a paper, it was a talk I went to in 2012 that I forgot about[1]

I wonder if anything came of it, does anyone know?

[0] https://hn.algolia.com/?q=bufferbloat (it made the front page more than once)

[1] https://youtu.be/1b17ggwkR60

[+] drothlis|5 years ago|reply

This "not quite 5 minute guide to making an NTP server" is from 2014: https://ava.upuaut.net/?p=726

So GPS boards for Raspberry Pi were cheap off-the-shelf hardware by 2014, at least.

[+] gerdesj|5 years ago|reply

"However there is a common assumption on this latency number. Is that you can divide it in half to get the time it takes to send data in one direction"

So you ssh the other end and ping in the other direction. I tend to use mtr instead.

Nowadays latency to internet services are within the sort of latencies we used to require on site. OK I do understand that not all internet connections are equal. I live in a fairly rural part of the UK - a small town of roughly 25,000 odd people. The UK is a fairly small country and fairly densely populated and quite rich, so in general: internets are fairly reasonable in comparison to the rest of the world. Not world beating but overall pretty decent.

If I ping say www.google.com (yes I know it's a funky address with random "distance") from home on my FTTC connection I see something like 15 to 20ms returns. I can ssh the office and use a 1GB link and get 8ms returns. My office PC is on the end of three 1GBs-1 hops in the building itself and it's a crappy old ex-customer hand me down (I run Arch Linux, I don't need whatever W10 requires). Both of those links are via the same ISP. My home PC test is via wifi!

I'm not going to bother going too much further with this now but if I was bothered (and I will be one day!), I would start my pinging n stuff at my internet facing router.

It wasn't that long ago that we (my company, about 10 years ago) accepted a condition of contract that required a site latency of 30ms end to end (ignoring hosts - the latency was directly measured without normal IT involvement.) 30ms!

51 comments