Skip to Content.
Sympa Menu

perfsonar-user - RE: [perfsonar-user] Now BWCTL issue....

Subject: perfSONAR User Q&A and Other Discussion

List archive

RE: [perfsonar-user] Now BWCTL issue....


Chronological Thread 
  • From: Amit <>
  • To: "'Hagen, Skye'" <>, "'Bruce A. Mah'" <>, 'John Mann' <>
  • Cc: 'Aaron Brown' <>,
  • Subject: RE: [perfsonar-user] Now BWCTL issue....
  • Date: Tue, 01 Apr 2014 12:15:21 +0530

Hi,

One of my server is not able to sync with public NTP servers. Please check
below output for ntpq

[root@perfdel
~]# ntpq -p
remote refid st t when poll reach delay offset
jitter
============================================================================
==
*10.255.255.3 10.255.255.35 2 u 85 128 377 1.345 -3.608
27.818
chronos.es.net .INIT. 16 u - 1024 0 0.000 0.000
0.000
nms-rlat.chic.n .INIT. 16 u - 1024 0 0.000 0.000
0.000
nms-rlat.hous.n .INIT. 16 u - 1024 0 0.000 0.000
0.000
nms-rlat.losa.n .INIT. 16 u - 1024 0 0.000 0.000
0.000
nms-rlat.newy32 .INIT. 16 u - 1024 0 0.000 0.000
0.000
saturn.es.net .INIT. 16 u - 1024 0 0.000 0.000
0.000

ntpq> as

ind assid status conf reach auth condition last_event cnt
===========================================================
1 31515 966a yes yes none sys.peer sys_peer 6
2 31516 8011 yes no none reject mobilize 1
3 31517 8011 yes no none reject mobilize 1
4 31518 8011 yes no none reject mobilize 1
5 31519 8011 yes no none reject mobilize 1
6 31520 8011 yes no none reject mobilize 1
7 31521 8011 yes no none reject mobilize 1

I tried with turning iptables off. What could be the issue?

--
Thanks & Regards

Amit Kumar
Scientific Officer
Operation and Routing Group
M/O Communication and IT, NIC, A- Block, CGO Complex, New Delhi
Ph. +911122900332, NKN VoIP:7332
Mob. +919910611621




-----Original Message-----
From:

[mailto:]
On Behalf Of Hagen, Skye
()
Sent: Tuesday, April 01, 2014 9:57 AM
To: Bruce A. Mah; John Mann
Cc: Amit Kumar; Aaron Brown;

Subject: RE: [perfsonar-user] Now BWCTL issue....

With NTP, one of the better setups is to call ntpdate during startup of the
system. This will set the clock. Then, run ntpd to keep the clock in sync.
If the clock is widely out of sync, ntpd will not correct it.

I use 5 servers, this will protect against one false chimer, and allow for
one to be off-line at the same time. Two is worse than one, unless you setup
ntp to prefer one server.

The interesting thing on his first server is the value of 'reach'. This is a
bit map of the last 8 contact attempts, displayed in octal. So, 352 means
that, working from the oldest attempt to the newest, attempts 8, 7, 6, 4 and
2 got a response. Attempts 5, 3 and the last attempt did not get a response.
(That is, assuming I am interpreting my octal correctly. Remember, there are
three kinds of people in the world. Those that are good at math, and those
that are not. :-) ) This would seem to indicate a congested link, or
discards on the path.

Skye Hagen
Network Engineer
University of Idaho


________________________________________
From:

<>
on behalf of Bruce A. Mah
<>
Sent: Monday, March 31, 2014 5:29 PM
To: John Mann
Cc: Amit Kumar; Aaron Brown;

Subject: Re: [perfsonar-user] Now BWCTL issue....

If memory serves me right, John Mann wrote:
> Hi,
>
> [ CC: list trimmed ]
>
> If memory serves me ... ntp likes to sync to a group of servers that
> are giving about the same time.
> If it can only see 1 source, it can't decide whether that is a
> truetimer or an outlying falseticker.

Well...if there's only one source, and it's valid, ntpd has to use that one.
(One of the hazards of having only one or two time servers.) I would expect
that perfSONAR host to (eventually) sync with that first server.

Also it's not clear why he couldn't sync with the public timeservers.
Firewall rules / network ACLs maybe?

> https://tools.ietf.org/html/rfc5905#section-11.1
> NMIN, CMIN ...
>
> Suggestions:
> - Wait. Sometimes ntp comes good after 20 mins / several hours.

Yes, depending on how the local ntpd is configured.

> - Add another ntp "server" (that has sync'd time) to the setup
> - e.g. use a router

I'm trying to resist the temptation to dive into NTP configuration trivia,
but having two servers isn't a whole lot better than one, because if one of
them misbehaves, the client can't tell which one to trust. My usual
practice for generic (i.e. non-perfSONAR) hosts, which mirrors what I
understand to be best practice, is to pick either 3 or 5 servers, with the
usual considerations for diversity.

> - "peer" the ntp clients together so that they can have confidence in
> each other and the primary source

Hrm, a bunch of clients that all peer with each other and get time from a
single server isn't really any better than just going to the single server.
If that server loses sync or goes down, the client are all going to lose
sync too, eventually, unless some of them are configured to use their local
clocks as high-stratum NTP servers (I am not recommending that step).

I'm pretty sure the original poster didn't want to set up a local NTP
infrastructure, he just wants to use what's available.

> It is a bit of a black art.

Oh it's not *that* bad. I haven't had to sacrifice any goats for several
years now. :-)

Bruce.

> You might end up with a ntp cloud that regains sync if you reboot one
> node, but if you reboot everything all at once it won't re-sync.
>
> Thanks,
> John
>
>
> On 1 April 2014 07:49, Bruce A. Mah
> <
>
> <mailto:>>
> wrote:
>
> If memory serves me right, Amit Kumar wrote:
> > Yes Aaron
>
> If I'm reading the ntpq -p output correctly...
>
> >>> remote refid st t when poll reach delay
> offset
> >>> jitter
> >>>
>
============================================================================
> >>> ==
> >>> 10.255.255.3 10.255.255.35 2 u 519 1024 352 1.396
> -4.134
> >>> 12.624
> >>> chronos.es.net <http://chronos.es.net> .INIT. 16 u
> - 1024 0 0.000 0.000
> >>> 0.000
> >>> nms-rlat.chic.n .INIT. 16 u - 1024 0 0.000
> 0.000
> >>> 0.000
> >>> nms-rlat.hous.n .INIT. 16 u - 1024 0 0.000
> 0.000
> >>> 0.000
> >>> nms-rlat.losa.n .INIT. 16 u - 1024 0 0.000
> 0.000
> >>> 0.000
> >>> nms-rlat.newy32 .INIT. 16 u - 1024 0 0.000
> 0.000
> >>> 0.000
> >>> saturn.es.net <http://saturn.es.net> .INIT. 16 u -
> 1024 0 0.000 0.000
> >>> 0.000
>
> ...it looks like the host in question isn't synched against
10.255.255.3
> (or anything else for that matter) because there's no "*" in front of
> that line...that indicates a host that is the current time source.
>
> Bruce.
>
>
>





Archive powered by MHonArc 2.6.16.

Top of Page