Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] Now BWCTL issue....

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] Now BWCTL issue....


Chronological Thread 
  • From: Jason Zurawski <>
  • To: Amit <>
  • Cc: "" <>
  • Subject: Re: [perfsonar-user] Now BWCTL issue....
  • Date: Wed, 2 Apr 2014 10:56:31 -0400

Amit;

I see two problems:

1) You have selected 4 servers that are located in the United states, and the
latency between you and they will be challenging for the NTP algorithms to
use for time keeping.

2) your servers are 10 seconds off from true time (your output from ntpdate
below suggests this)

With regards to the first issue, reconsider what servers are loaded in your
/etc/ntp.conf file. For example, we know of some servers located in Japan
and Tawain that may be closer latency wise to you:

> ns.twgrid.org
> 140.109.7.5
> ns2.jp.apan.net
> clock.nc.fukuoka-u.ac.jp
> ntp.nict.jp

Failing this, you can look at the public list of servers from the NTP project
(note that we don't maintain this. and I can't vouch for how accurate it may
be):

http://support.ntp.org/bin/view/Servers/StratumTwoTimeServers

Consider loading in 3-4 other servers (instead of the 4 from the US that you
have now) from India or other close geographical locations. This will give
NTP a better chance of working.

With regards to the 2nd issue, the algorithms in NTPD will try to eliminate
'false tickers'. I do not remember the specifics, but it may be the case
that if clocks appear to be seconds off, they will not be polled anymore for
consideration and may fall into the state that you listed in the output from
NTPQ. I would suggest the following to test this out:

1) Stop ntpd (sudo /etc/init.d/ntpd stop)

2) Configure 3-4 new servers from the first part above, remove the US based
servers

3) Run this command (sudo ntpd -qgd). Note that ntpdate is depreciated, and
this command will force a time jump on your local clock based on the clocks
that are loaded in /etc/ntp.conf file. The output will also be useful for
debugging what is going on with the daemon:

> [zurawski@localhost
> ~]$ sudo ntpd -qgd
> ntpd
>
> Sat Nov 23 18:21:48 UTC 2013 (1)
> 2 Apr 07:42:00 ntpd[19434]: proto: precision = 0.062 usec
> 2 Apr 07:42:00 ntpd[19434]: 0.0.0.0 c01d 0d kern kernel time sync enabled
> event at 0 0.0.0.0 c01d 0d kern kernel time sync enabled
> Finished Parsing!!
> 2 Apr 07:42:00 ntpd[19434]: ntp_io: estimated max descriptors: 1024,
> initial socket boundary: 16
> 2 Apr 07:42:00 ntpd[19434]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123
> 2 Apr 07:42:00 ntpd[19434]: Listen and drop on 1 v6wildcard :: UDP 123
> 2 Apr 07:42:00 ntpd[19434]: Listen normally on 2 lo 127.0.0.1 UDP 123
> restrict: op 1 addr 127.0.0.1 mask 255.255.255.255 mflags 00003000 flags
> 00000001
> 2 Apr 07:42:00 ntpd[19434]: Listen normally on 3 eth0 192.12.15.23 UDP 123
> restrict: op 1 addr 192.12.15.23 mask 255.255.255.255 mflags 00003000 flags
> 00000001
> 2 Apr 07:42:00 ntpd[19434]: Listen normally on 4 lo ::1 UDP 123
> restrict: op 1 addr ::1 mask ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff mflags
> 00003000 flags 00000001
> 2 Apr 07:42:00 ntpd[19434]: Listen normally on 5 eth0
> fe80::260:ddff:fe45:81b8 UDP 123
> restrict: op 1 addr fe80::260:ddff:fe45:81b8 mask
> ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff mflags 00003000 flags 00000001
> 2 Apr 07:42:00 ntpd[19434]: Listen normally on 6 eth1
> fe80::260:ddff:fe45:81b9 UDP 123
> restrict: op 1 addr fe80::260:ddff:fe45:81b9 mask
> ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff mflags 00003000 flags 00000001
> 2 Apr 07:42:00 ntpd[19434]: peers refreshed
> 2 Apr 07:42:00 ntpd[19434]: Listening on routing socket on fd #23 for
> interface updates
> restrict: op 1 addr 0.0.0.0 mask 0.0.0.0 mflags 00000000 flags 000001d0
> restrict: op 1 addr :: mask :: mflags 00000000 flags 000001d0
> restrict: op 1 addr 127.0.0.1 mask 255.255.255.255 mflags 00000000 flags
> 00000000
> restrict: op 1 addr ::1 mask ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff mflags
> 00000000 flags 00000000
> key_expire: at 0 associd 43016
> peer_clear: at 0 next 1 associd 43016 refid INIT
> event at 0 64.57.17.34 8011 81 mobilize assoc 43016
> newpeer: 192.12.15.23->64.57.17.34 mode 3 vers 4 poll 6 10 flags 0x101 0x1
> ttl 0 key 00000000
> key_expire: at 0 associd 43017
> peer_clear: at 0 next 2 associd 43017 refid INIT
> event at 0 64.57.16.162 8011 81 mobilize assoc 43017
> newpeer: 192.12.15.23->64.57.16.162 mode 3 vers 4 poll 6 10 flags 0x101 0x1
> ttl 0 key 00000000
> key_expire: at 0 associd 43018
> peer_clear: at 0 next 3 associd 43018 refid INIT
> event at 0 64.57.17.162 8011 81 mobilize assoc 43018
> newpeer: 192.12.15.23->64.57.17.162 mode 3 vers 4 poll 6 10 flags 0x101 0x1
> ttl 0 key 00000000
> key_expire: at 0 associd 43019
> peer_clear: at 0 next 4 associd 43019 refid INIT
> event at 0 64.57.17.98 8011 81 mobilize assoc 43019
> newpeer: 192.12.15.23->64.57.17.98 mode 3 vers 4 poll 6 10 flags 0x101 0x1
> ttl 0 key 00000000
> key_expire: at 0 associd 43020
> peer_clear: at 0 next 5 associd 43020 refid INIT
> event at 0 198.124.252.90 8011 81 mobilize assoc 43020
> newpeer: 192.12.15.23->198.124.252.90 mode 3 vers 4 poll 6 10 flags 0x101
> 0x1 ttl 0 key 00000000
> key_expire: at 0 associd 43021
> peer_clear: at 0 next 6 associd 43021 refid INIT
> event at 0 198.129.252.38 8011 81 mobilize assoc 43021
> newpeer: 192.12.15.23->198.129.252.38 mode 3 vers 4 poll 6 10 flags 0x101
> 0x1 ttl 0 key 00000000
> 2 Apr 07:42:00 ntpd[19434]: 0.0.0.0 c016 06 restart
> event at 0 0.0.0.0 c016 06 restart
> 2 Apr 07:42:00 ntpd[19434]: 0.0.0.0 c012 02 freq_set kernel 37.049 PPM
> event at 0 0.0.0.0 c012 02 freq_set kernel 37.049 PPM
> transmit: at 1 192.12.15.23->64.57.17.34 mode 3 len 48
> auth_agekeys: at 1 keys 1 expired 0
> receive: at 1 192.12.15.23<-64.57.17.34 mode 4 len 48
> event at 1 64.57.17.34 8024 84 reachable
> filegen 2 3605438521 0 3605385600
> clock_filter: n 1 off 0.000017 del 0.028897 dsp 7.937501 jit 0.000000
> transmit: at 2 192.12.15.23->64.57.16.162 mode 3 len 48
> receive: at 2 192.12.15.23<-64.57.16.162 mode 4 len 48
> event at 2 64.57.16.162 8024 84 reachable
> clock_filter: n 1 off -0.002630 del 0.049552 dsp 7.937501 jit 0.000000
> transmit: at 3 192.12.15.23->64.57.17.162 mode 3 len 48
> transmit: at 3 192.12.15.23->64.57.17.34 mode 3 len 48
> receive: at 3 192.12.15.23<-64.57.17.34 mode 4 len 48
> clock_filter: n 2 off 0.000016 del 0.028883 dsp 3.937509 jit 0.000001
> receive: at 3 192.12.15.23<-64.57.17.162 mode 4 len 48
> event at 3 64.57.17.162 8024 84 reachable
> clock_filter: n 1 off -0.001142 del 0.079054 dsp 7.937501 jit 0.000000
> transmit: at 4 192.12.15.23->64.57.17.98 mode 3 len 48
> transmit: at 4 192.12.15.23->64.57.16.162 mode 3 len 48
> receive: at 4 192.12.15.23<-64.57.17.98 mode 4 len 48
> event at 4 64.57.17.98 8024 84 reachable
> clock_filter: n 1 off 0.000462 del 0.002234 dsp 7.937501 jit 0.000000
> receive: at 4 192.12.15.23<-64.57.16.162 mode 4 len 48
> clock_filter: n 2 off -0.002611 del 0.049590 dsp 3.937509 jit 0.000019
> transmit: at 5 192.12.15.23->64.57.17.162 mode 3 len 48
> transmit: at 5 192.12.15.23->64.57.17.34 mode 3 len 48
> transmit: at 5 192.12.15.23->198.124.252.90 mode 3 len 48
> receive: at 5 192.12.15.23<-198.124.252.90 mode 4 len 48
> event at 5 198.124.252.90 8024 84 reachable
> clock_filter: n 1 off -0.000275 del 0.000733 dsp 7.937500 jit 0.000000
> receive: at 5 192.12.15.23<-64.57.17.34 mode 4 len 48
> clock_filter: n 3 off -0.000007 del 0.028945 dsp 1.937516 jit 0.000023
> receive: at 5 192.12.15.23<-64.57.17.162 mode 4 len 48
> clock_filter: n 2 off -0.001134 del 0.079163 dsp 3.937509 jit 0.000008
> transmit: at 6 192.12.15.23->198.129.252.38 mode 3 len 48
> transmit: at 6 192.12.15.23->64.57.17.98 mode 3 len 48
> transmit: at 6 192.12.15.23->64.57.16.162 mode 3 len 48
> receive: at 6 192.12.15.23<-64.57.17.98 mode 4 len 48
> clock_filter: n 2 off 0.000393 del 0.002254 dsp 3.937508 jit 0.000069
> receive: at 6 192.12.15.23<-64.57.16.162 mode 4 len 48
> clock_filter: n 3 off -0.002675 del 0.049812 dsp 1.937517 jit 0.000055
> receive: at 6 192.12.15.23<-198.129.252.38 mode 4 len 48
> event at 6 198.129.252.38 8024 84 reachable
> clock_filter: n 1 off -0.000388 del 0.075274 dsp 7.937501 jit 0.000000
> transmit: at 7 192.12.15.23->64.57.17.162 mode 3 len 48
> transmit: at 7 192.12.15.23->64.57.17.34 mode 3 len 48
> transmit: at 7 192.12.15.23->198.124.252.90 mode 3 len 48
> receive: at 7 192.12.15.23<-198.124.252.90 mode 4 len 48
> clock_filter: n 2 off -0.000291 del 0.000575 dsp 3.937508 jit 0.000016
> receive: at 7 192.12.15.23<-64.57.17.34 mode 4 len 48
> clock_filter: n 4 off -0.000017 del 0.028789 dsp 0.937522 jit 0.000028
> select: combine offset -0.000017273 jitter 0.000000000
> event at 7 64.57.17.34 903a 8a sys_peer
> clock_update: at 7 sample 7 associd 43016
> 2 Apr 07:42:07 ntpd[19434]: ntpd: time slew -0.000017 s
> ntpd: time slew -0.000017s
> filegen 2 3605438527 0 3605385600


4) If you see that the clock has skipped 10 seconds, restart the daemon (sudo
/etc/init.d/ntpd start)

Good luck;

-jason

On Apr 2, 2014, at 12:06 AM, Amit
<>
wrote:

> Hi Skye,
>
> I Have already checked everything, nothing is the case, I am having 5 more
> servers with the same configuration and settings.
> Please find below configuration file.
>
> ----------------------------------------------------------------------------
> ------------------------
> # /etc/ntp.conf, configuration for NTP
> # by default act only as a basic NTP client
> restrict -4 default nomodify nopeer noquery notrap
> restrict -6 default nomodify nopeer noquery notrap
> # allow NTP messages from the loopback address, useful for debugging
> restrict 127.0.0.1
> restrict ::1
> logfile /var/log/ntpd
> driftfile /var/lib/ntp/ntp.drift
> statsdir /var/lib/ntp/
> statistics loopstats peerstats clockstats
> filegen loopstats file loopstats type day enable
> filegen peerstats file peerstats type day enable
> filegen clockstats file clockstats type day enable
>
> # You should have at least 4 NTP servers
> server 127.127.1.0 # local clock
>
> server 10.255.255.3 iburst
>
> server chronos.es.net iburst # ESnet - New York, NY USA
>
> server owamp.chic.net.internet2.edu iburst # Internet2 - Chicago, IL USA
>
> server owamp.hous.net.internet2.edu iburst # Internet2 - Houston, TX USA
>
> server owamp.losa.net.internet2.edu iburst # Internet2 - Los Angeles, CA
> USA
>
> server owamp.newy.net.internet2.edu iburst # Internet2 - New York, NY USA
>
> server saturn.es.net iburst # ESnet - Sunnyvale, CA USA
> ----------------------------------------------------------------------------
> --------------------------------
>
> Secondly I could do the traceroute the above NTP server, also please find
> below the "ntpdate" output for particular server
>
> ----------------------------------------------------------------------------
> ------------
> [root@perfdel
> ~]# ntpdate -ud saturn.es.net
> 2 Apr 09:28:04 ntpdate[18041]: ntpdate
>
> Sat Nov 23 18:21:48
> UTC 2013 (1)
> Looking for host saturn.es.net and service ntp
> host found : saturn.es.net
> transmit(198.129.252.38)
> receive(198.129.252.38)
> transmit(198.129.252.38)
> receive(198.129.252.38)
> transmit(198.129.252.38)
> receive(198.129.252.38)
> transmit(198.129.252.38)
> receive(198.129.252.38)
> server 198.129.252.38, port 123
> stratum 1, precision -22, leap 00, trust 000
> refid [CDMA], delay 0.30948, dispersion 0.00000
> transmitted 4, in filter 4
> reference time: d6e60951.a0b860f1 Wed, Apr 2 2014 9:28:09.627
> originate timestamp: d6e60958.883e60d2 Wed, Apr 2 2014 9:28:16.532
> transmit timestamp: d6e6094e.32824d9a Wed, Apr 2 2014 9:28:06.197
> filter delay: 0.30952 0.30948 0.30952 0.30952
> 0.00000 0.00000 0.00000 0.00000
> filter offset: 10.19292 10.19292 10.19292 10.19291
> 0.000000 0.000000 0.000000 0.000000
> delay 0.30948, dispersion 0.00000
> offset 10.192929
>
> 2 Apr 09:28:06 ntpdate[18041]: step time server 198.129.252.38 offset
> 10.192929 sec
> ----------------------------------------------------------------------------
> ----------------------------------------------------
>
>
> --
> Thanks & Regards
>
> Amit Kumar
> Scientific Officer
> Operation and Routing Group
> M/O Communication and IT, NIC, A- Block, CGO Complex, New Delhi
> Ph. +911122900332, NKN VoIP:7332
> Mob. +919910611621
>
>
>
>
>
> -----Original Message-----
> From:
>
> [mailto:]
> On Behalf Of Hagen, Skye
> ()
> Sent: Tuesday, April 01, 2014 11:47 PM
> To: Amit; 'Bruce A. Mah'; 'John Mann'
> Cc: 'Aaron Brown';
>
> Subject: Re: [perfsonar-user] Now BWCTL issue....
>
> There are a number of things that this could be.
>
> iptables is not the only access control for NTP running on the system.
> ntpd itself has access controls (see the 'restrict' directive) in the
> configuration file.
>
> You should verify that you can traceroute to the other servers, do you have
> the correct routing?
>
> There could be firewalls, such as a border firewall, somewhere in the
> network path that is blocking you.
>
> The remote end could be blocking you. You should check the access policy for
> the sites that you are using.
>
> While it doesn't appear to be the case here, NTP can also be authenticated.
>
> You may want to check out the NTP Debugging Techniques page at
> http://www.eecis.udel.edu/~mills/ntp/html/debug.html
>
> Skye.
>
> On 3/31/14 11:45 PM, "Amit"
> <>
> wrote:
>
>> Hi,
>>
>> One of my server is not able to sync with public NTP servers. Please
>> check below output for ntpq
>>
>> [root@perfdel
>> ~]# ntpq -p
>> remote refid st t when poll reach delay offset
>> jitter
>> =======================================================================
>> ===
>> ==
>> ==
>> *10.255.255.3 10.255.255.35 2 u 85 128 377 1.345 -3.608
>> 27.818
>> chronos.es.net .INIT. 16 u - 1024 0 0.000 0.000
>> 0.000
>> nms-rlat.chic.n .INIT. 16 u - 1024 0 0.000 0.000
>> 0.000
>> nms-rlat.hous.n .INIT. 16 u - 1024 0 0.000 0.000
>> 0.000
>> nms-rlat.losa.n .INIT. 16 u - 1024 0 0.000 0.000
>> 0.000
>> nms-rlat.newy32 .INIT. 16 u - 1024 0 0.000 0.000
>> 0.000
>> saturn.es.net .INIT. 16 u - 1024 0 0.000 0.000
>> 0.000
>>
>> ntpq> as
>>
>> ind assid status conf reach auth condition last_event cnt
>> ===========================================================
>> 1 31515 966a yes yes none sys.peer sys_peer 6
>> 2 31516 8011 yes no none reject mobilize 1
>> 3 31517 8011 yes no none reject mobilize 1
>> 4 31518 8011 yes no none reject mobilize 1
>> 5 31519 8011 yes no none reject mobilize 1
>> 6 31520 8011 yes no none reject mobilize 1
>> 7 31521 8011 yes no none reject mobilize 1
>>
>> I tried with turning iptables off. What could be the issue?
>>
>> --
>> Thanks & Regards
>>
>> Amit Kumar
>> Scientific Officer
>> Operation and Routing Group
>> M/O Communication and IT, NIC, A- Block, CGO Complex, New Delhi Ph.
>> +911122900332, NKN VoIP:7332 Mob. +919910611621
>>
>>
>>
>>
>> -----Original Message-----
>> From:
>>
>> [mailto:]
>> On Behalf Of Hagen, Skye
>> ()
>> Sent: Tuesday, April 01, 2014 9:57 AM
>> To: Bruce A. Mah; John Mann
>> Cc: Amit Kumar; Aaron Brown;
>>
>> Subject: RE: [perfsonar-user] Now BWCTL issue....
>>
>> With NTP, one of the better setups is to call ntpdate during startup of
>> the system. This will set the clock. Then, run ntpd to keep the clock
>> in sync.
>> If the clock is widely out of sync, ntpd will not correct it.
>>
>> I use 5 servers, this will protect against one false chimer, and allow
>> for one to be off-line at the same time. Two is worse than one, unless
>> you setup ntp to prefer one server.
>>
>> The interesting thing on his first server is the value of 'reach'. This
>> is a bit map of the last 8 contact attempts, displayed in octal. So,
>> 352 means that, working from the oldest attempt to the newest, attempts
>> 8, 7, 6, 4 and
>> 2 got a response. Attempts 5, 3 and the last attempt did not get a
>> response.
>> (That is, assuming I am interpreting my octal correctly. Remember,
>> there are three kinds of people in the world. Those that are good at
>> math, and those that are not. :-) ) This would seem to indicate a
>> congested link, or discards on the path.
>>
>> Skye Hagen
>> Network Engineer
>> University of Idaho
>>
>>
>> ________________________________________
>> From:
>>
>> <>
>> on behalf of Bruce A. Mah
>> <>
>> Sent: Monday, March 31, 2014 5:29 PM
>> To: John Mann
>> Cc: Amit Kumar; Aaron Brown;
>>
>> Subject: Re: [perfsonar-user] Now BWCTL issue....
>>
>> If memory serves me right, John Mann wrote:
>>> Hi,
>>>
>>> [ CC: list trimmed ]
>>>
>>> If memory serves me ... ntp likes to sync to a group of servers that
>>> are giving about the same time.
>>> If it can only see 1 source, it can't decide whether that is a
>>> truetimer or an outlying falseticker.
>>
>> Well...if there's only one source, and it's valid, ntpd has to use that
>> one.
>> (One of the hazards of having only one or two time servers.) I would
>> expect that perfSONAR host to (eventually) sync with that first server.
>>
>> Also it's not clear why he couldn't sync with the public timeservers.
>> Firewall rules / network ACLs maybe?
>>
>>> https://tools.ietf.org/html/rfc5905#section-11.1
>>> NMIN, CMIN ...
>>>
>>> Suggestions:
>>> - Wait. Sometimes ntp comes good after 20 mins / several hours.
>>
>> Yes, depending on how the local ntpd is configured.
>>
>>> - Add another ntp "server" (that has sync'd time) to the setup
>>> - e.g. use a router
>>
>> I'm trying to resist the temptation to dive into NTP configuration
>> trivia, but having two servers isn't a whole lot better than one,
>> because if one of them misbehaves, the client can't tell which one to
>> trust. My usual practice for generic (i.e. non-perfSONAR) hosts, which
>> mirrors what I understand to be best practice, is to pick either 3 or 5
>> servers, with the usual considerations for diversity.
>>
>>> - "peer" the ntp clients together so that they can have confidence in
>>> each other and the primary source
>>
>> Hrm, a bunch of clients that all peer with each other and get time from
>> a single server isn't really any better than just going to the single
>> server.
>> If that server loses sync or goes down, the client are all going to
>> lose sync too, eventually, unless some of them are configured to use
>> their local clocks as high-stratum NTP servers (I am not recommending
>> that step).
>>
>> I'm pretty sure the original poster didn't want to set up a local NTP
>> infrastructure, he just wants to use what's available.
>>
>>> It is a bit of a black art.
>>
>> Oh it's not *that* bad. I haven't had to sacrifice any goats for
>> several years now. :-)
>>
>> Bruce.
>>
>>> You might end up with a ntp cloud that regains sync if you reboot one
>>> node, but if you reboot everything all at once it won't re-sync.
>>>
>>> Thanks,
>>> John
>>>
>>>
>>> On 1 April 2014 07:49, Bruce A. Mah
>>> <
>>>
>>> <mailto:>>
>>> wrote:
>>>
>>> If memory serves me right, Amit Kumar wrote:
>>>> Yes Aaron
>>>
>>> If I'm reading the ntpq -p output correctly...
>>>
>>>>>> remote refid st t when poll reach delay
>>> offset
>>>>>> jitter
>>>>>>
>>>
>> =======================================================================
>> ===
>> ==
>>>>>> ==
>>>>>> 10.255.255.3 10.255.255.35 2 u 519 1024 352 1.396
>>> -4.134
>>>>>> 12.624
>>>>>> chronos.es.net <http://chronos.es.net> .INIT. 16 u
>>> - 1024 0 0.000 0.000
>>>>>> 0.000
>>>>>> nms-rlat.chic.n .INIT. 16 u - 1024 0 0.000
>>> 0.000
>>>>>> 0.000
>>>>>> nms-rlat.hous.n .INIT. 16 u - 1024 0 0.000
>>> 0.000
>>>>>> 0.000
>>>>>> nms-rlat.losa.n .INIT. 16 u - 1024 0 0.000
>>> 0.000
>>>>>> 0.000
>>>>>> nms-rlat.newy32 .INIT. 16 u - 1024 0 0.000
>>> 0.000
>>>>>> 0.000
>>>>>> saturn.es.net <http://saturn.es.net> .INIT. 16 u -
>>> 1024 0 0.000 0.000
>>>>>> 0.000
>>>
>>> ...it looks like the host in question isn't synched against
>> 10.255.255.3
>>> (or anything else for that matter) because there's no "*" in
>>> front of
>>> that line...that indicates a host that is the current time source.
>>>
>>> Bruce.




Archive powered by MHonArc 2.6.16.

Top of Page