Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] Negative Latency Times on MaDDash

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] Negative Latency Times on MaDDash


Chronological Thread 
  • From: Mark Feit <>
  • To: Michael Reece <>, "<>" <>
  • Subject: Re: [perfsonar-user] Negative Latency Times on MaDDash
  • Date: Thu, 31 Mar 2016 13:40:01 +0000
  • Accept-language: en-US
  • Authentication-results: nd.edu; dkim=none (message not signed) header.d=none;nd.edu; dmarc=none action=none header.from=internet2.edu;

(Andy kind of beat me to it while I was writing this…)

Michael Reece writes:
I know that this issue has been raised before, but perhaps someone has a solution to it. For some reason the latency times between several of the nodes in my mesh are negative. I've been told before that this is likely due to the close proximity of the nodes in the mesh (most are less than a mile apart).

Negative latency happens when the actual latency falls within the clock error and the receiving end’s clock is behind the one at the sending end.

The 1 ms of error you get from using NTP across the Internet won’t cause problems when the hosts are more than 100 miles of each other.  Using local, stratum-1 NTP servers cuts that down to 250 µs and 25 miles, which still doesn’t help in your case.  It is possible to discipline clocks to tighter tolerances than that, but it gets very expensive very quickly.

The development team has discussed this problem previously, and short of better clock discipline, none of what we can do is a good solution:
  • Take the absolute value of the result.  This is probably as close to accurate as we can get without reporting a negative number.  Inside of the radius where clock accuracy plays a role, any figure you get will be influenced more by the difference between the clocks than the actual transit time.
  • Truncate or round.  Report anything less than 1 ms as “< 1 ms,” “effectively zero” or “too short to measure."  This is probably the most practical thing to do because it doesn’t imply that the result is accurate.
  • Leave it as it is.  Negative numbers at least give you some indication that the value is bogus, but positive ones don’t.
For purposes of error reporting, I’d treat anything shorter than the limits of your ability to keep time (< 1 ms) as being okay.  Rade’s suggestion of adjusting the lower threshold of where Nagios throws an alarm would probably help.

Dan Magorian at the Johns Hopkins Applied Physics Lab is is trying to start up a subgroup within the Performance Working Group that focuses on tools for making measurements in low-latency environments such as LANs.  It’s an interesting topic, but not something perfSONAR has the cycles to chase at the moment.  If you’re interested, drop him a line and ask to be included.

—Mark




Archive powered by MHonArc 2.6.16.

Top of Page