(no title)
Elixir6419 | 1 year ago
An argument could be made for a device configured as such to show loss on ping but not on mtr if you configure the rate limits so that the icmp reply rate is lower than ttl expired rates. Which tool would be wrong than? Would you blame ping for producing misleading results?
The running counters and the ability to pick out the obvious rate limiting when the loss doesn't cascade into the hops to me is akin to traceroutes * * * output. It doesn't always mean that the packets are blackholed, connectivity is broken, it just means the tool is producing an artifact due to network configuration or network characteristics. Further investigation is needed to figure out what's going on.
MTR imho is giving you much more insight into the network than traceroute or ping separately. It doesn't resolve the usual firewall/rate limiting artifacts, but gives you way more information about paths if you know how to interpret them.
commandersaki|1 year ago
I'm not sure I understand what you're saying, but in this case control-plane packet rates are different for generating TTL exceeded vs Echo Response, where one is giving 80% loss and the other is giving 0% loss at similar rates. Gripe #1 why are we even testing control plane in the first place, it's a useless metric that doesn't have utility at measuring end to end latency/loss.
> An argument could be made for a device configured as such to show loss on ping but not on mtr if you configure the rate limits so that the icmp reply rate is lower than ttl expired rates. Which tool would be wrong than? Would you blame ping for producing misleading results?
Sure that would be a problem, but any combination could be misleading if the data path is yielding 0% loss for high rates of ICMP end to end. This is why it's not a very particularly helpful metric and can be downright misleading (usually not to me, but I've seen plenty people make incorrect inferences from bunk MTR results because the tool isn't intuitive).
> The running counters and the ability to pick out the obvious rate limiting when the loss doesn't cascade into the hops to me is akin to traceroutes * * * output. It doesn't always mean that the packets are blackholed, connectivity is broken, it just means the tool is producing an artifact due to network configuration or network characteristics. Further investigation is needed to figure out what's going on.
Sure that's great, not particularly helpful to the masses who misunderstand the tool. I worked as a network engineer for a decade receiving bunk MTR reports where people freak out because they're seeing "packet loss" which was inexistent on the data forwarding plane (you know the one that actually matters).
> MTR imho is giving you much more insight into the network than traceroute or ping separately. It doesn't resolve the usual firewall/rate limiting artifacts, but gives you way more information about paths if you know how to interpret them.
Time shouldn't be wasted measuring the control path and then investigating to confirm it is the control path and not data path. You cannot make these mistakes using traceroute and ping separately because traceroute doesn't have a notion of a "per-hop" loss indicator and ping doesn't involve intermediate hops (unless an intermediate hop generates an ICMP diagnostic for an echo request).
Elixir6419|1 year ago
Understanding can be improved. Bunk MTRs are easy to spot. You tell them this is not an issue because .... . Than they will learn and usually that customer will stop sending you bunk MTRs.
I'm pretty sure that the people that are opening tickets with providers/network teams because they have nothing better to do is nearing 0. The fact that they ran an MTR shows that they were doing some troubleshooting and at the end of the day a problem needs to be solved. It may not be on your end but that needs to be investigated but the same would apply for a crappy iperf throughput test. IMHO Any clue/information into where that problem is, is helpful. You may need to filter relevant from irrelevant.
But if I get to pick one out of 2 problems, one has a crappy iperf results, the other has an MTR that has a loss that carries over, I would probably pick the second because that at least gives me indication on whereabouts should I start looking.
> Time shouldn't be wasted measuring the control path and then investigating to confirm it is the control path and not data path. You cannot make these mistakes using traceroute and ping separately because traceroute doesn't have a notion of a "per-hop" loss indicator.
traceroute does have per-hop indicator, it's the * in the output, it's just so often off that nobody pays much attention. You can't really catch issues that are related to route-flaps or reroutes with traceroute. with MTRs it becomes pretty clear if a reroute happens in the middle of your test. I guess you can keep running traceroute but I will leave it to you to sift through the output of that nightmare and than it effectively became MTR, with worse output.
There are also many options available in MTR that is not there in traceroute (to trigger these packets by tcp or udp packets), fix local or remote port etc. Even if you just run it with 3 packets per hop, you will have way more options. You don't have to use it as a continuous monitor to indicate packetloss but can give you the traceroute level information in a much cleaner format and you have more options to choose from.
> ping doesn't involve intermediate hops (unless an intermediate hop generates an ICMP diagnostic for an echo request).
ICMP echo requests and replys can be subject to different QoS treatment as TCP/UDP traffic, so that also doesn't necessarily gives you the right idea when testing for end to end connectivity issue. Iperf imho is the best bet, and if you want to be really accurate you pick the src/dst port for client/server just to be sure you get into the same Class as your problematic traffic.
As a sidenote MTR packets are also ride the data-plane until they reach the TTL=1.