perfsonar-user - RE: [perf-node-users] [perfsonar-user] LSRegistrationDaemon duplicate checksum lookup/host/ error
Subject: perfSONAR User Q&A and Other Discussion
List archive
RE: [perf-node-users] [perfsonar-user] LSRegistrationDaemon duplicate checksum lookup/host/ error
Chronological Thread
- From: "Lixin Liu" <>
- To: <>, "'Performance Node Users'" <>
- Subject: RE: [perf-node-users] [perfsonar-user] LSRegistrationDaemon duplicate checksum lookup/host/ error
- Date: Mon, 30 Sep 2013 14:00:52 -0700 (PDT)
Hi Jason,
Looking at the ls_registration_daemon.log on number of our hosts,
it appears we have some difficulties to connect to your directory
servers. I see all these hosts report "400 Bad Request" errors
every hour.
Thanks,
Lixin.
2013/09/30 14:06:00 (15279) INFO> Interface.pm:127
perfSONAR_PS::LSRegistrationDaemon::Interface::build_checksum - Checksum is
ca4JrKF9a0bW+9fMVcgcSQ
2013/09/30 14:06:00 (15279) INFO> Base.pm:178
perfSONAR_PS::LSRegistrationDaemon::Base::refresh - Record 'p2p1' is up,
registering
2013/09/30 14:06:00 (15279) ERROR> Base.pm:226
perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering
service. Will retry full registration next time: 403 Forbidden
2013/09/30 14:06:00 (15279) WARN> daemon.pl:251 main::__ANON__ - Warned: Use
of uninitialized value in sort at
/opt/perfsonar_ps/ls_registration_daemon/bin/../lib/perfSONAR_PS/LSRegistrationDaemon/Host.pm
line 334.
2013/09/30 14:06:00 (15279) WARN> daemon.pl:251 main::__ANON__ - Warned: Use
of uninitialized value in join or string at
/opt/perfsonar_ps/ls_registration_daemon/bin/../lib/perfSONAR_PS/LSRegistrationDaemon/Host.pm
line 334.
2013/09/30 14:06:00 (15279) INFO> Host.pm:309
perfSONAR_PS::LSRegistrationDaemon::Host::build_checksum - Checksum is
0TtxBBUueOQpqJhpT/fdBA
2013/09/30 14:06:00 (15279) INFO> Base.pm:178
perfSONAR_PS::LSRegistrationDaemon::Base::refresh - Record
'bdw-ucalgary.westgrid.ca' is up, registering
2013/09/30 14:06:00 (15279) ERROR> Base.pm:226
perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering
service. Will retry full registration next time: 400 Bad Request
2013/09/30 14:06:00 (15279) INFO> Service.pm:223
perfSONAR_PS::LSRegistrationDaemon::Service::build_checksum - Checksum is
d7gIByEaY/6Ic3BkjzOyJg
2013/09/30 14:06:00 (15279) INFO> Base.pm:178
perfSONAR_PS::LSRegistrationDaemon::Base::refresh - Record 'University of
Calgary Ping Responder' is up, registering
2013/09/30 14:06:01 (15279) ERROR> Base.pm:226
perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering
service. Will retry full registration next time: 400 Bad Request
2013/09/30 14:06:01 (15279) INFO> Service.pm:223
perfSONAR_PS::LSRegistrationDaemon::Service::build_checksum - Checksum is
vu6Wo2l/GUk7CbRvFFovow
2013/09/30 14:06:01 (15279) INFO> Base.pm:178
perfSONAR_PS::LSRegistrationDaemon::Base::refresh - Record 'University of
Calgary Traceroute Responder' is up, registering
2013/09/30 14:06:01 (15279) ERROR> Base.pm:226
perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering
service. Will retry full registration next time: 400 Bad Request
2013/09/30 14:06:01 (15279) INFO> Base.pm:194
perfSONAR_PS::LSRegistrationDaemon::Base::refresh - Record 'University of
Calgary OWAMP Server' is down
2013/09/30 14:06:01 (15279) INFO> Service.pm:223
perfSONAR_PS::LSRegistrationDaemon::Service::build_checksum - Checksum is
TBtP9qqBGm8nKPruWIySXQ
2013/09/30 14:06:01 (15279) INFO> Base.pm:178
perfSONAR_PS::LSRegistrationDaemon::Base::refresh - Record 'University of
Calgary BWCTL Server' is up, registering
2013/09/30 14:06:01 (15279) ERROR> Base.pm:226
perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering
service. Will retry full registration next time: 400 Bad Request
2013/09/30 14:06:01 (15279) INFO> Service.pm:223
perfSONAR_PS::LSRegistrationDaemon::Service::build_checksum - Checksum is
JZMxKg3PkoEbBRTKtu8uog
2013/09/30 14:06:01 (15279) INFO> Base.pm:178
perfSONAR_PS::LSRegistrationDaemon::Base::refresh - Record 'University of
Calgary NDT Server' is up, registering
2013/09/30 14:06:01 (15279) ERROR> Base.pm:226
perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering
service. Will retry full registration next time: 400 Bad Request
2013/09/30 14:06:01 (15279) INFO> Service.pm:223
perfSONAR_PS::LSRegistrationDaemon::Service::build_checksum - Checksum is
XxvLx+/jXY4OUM7up8PJLA
2013/09/30 14:06:01 (15279) INFO> Base.pm:178
perfSONAR_PS::LSRegistrationDaemon::Base::refresh - Record 'University of
Calgary NPAD Server' is up, registering
2013/09/30 14:06:01 (15279) ERROR> Base.pm:226
perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering
service. Will retry full registration next time: 400 Bad Request
> -----Original Message-----
> From: Jason Zurawski
> [mailto:]
> Sent: September-30-13 9:33 AM
> To: Lixin Liu
> Cc:
> ;
> Performance Node Users
> Subject: Re: [perf-node-users] [perfsonar-user] LSRegistrationDaemon
> duplicate checksum lookup/host/ error
>
> Lixin;
>
> There are several directory servers that manage registrations, not just
> the
> one. I am able to see some of the hosts you mention below, note that
> sometimes a host registers with an IP address if there was a lack of DNS
> name to start:
>
> > bdw-usask.westgrid.ca
> http://ndb1.internet2.edu:8090/lookup/records?host-name=206.12.26.19
>
> > bdw-sfu.westgrid.ca
> http://antg.es.net:8090/lookup/records?host-name=bdw-sfu.westgrid.ca
>
> > bdw-ucalgary.westgrid.ca
> *cannot find*
>
> > bdw-ubc.westgrid.ca
> http://antg.es.net:8090/lookup/records?host-name=206.12.24.189
>
> > bdw-uvic.westgrid.ca
> *cannot find*
>
> > lat-sfu.westgrid.ca
> I see this: http://ps4.es.net:9095/lookup/records?host-name=ps-
> latency.sfu.westgrid.ca - did the host's name change recently?
>
> > lat-ucalgary.westgrid.ca
> *cannot find*
>
> > lat-ubc.westgrid.ca
> *cannot find*
>
> > lat-uvic.westgrid.ca
>
> http://antg.es.net:8090/lookup/records?host-name=lat-uvic.westgrid.ca
>
> You can try to remove the database on the hosts as you did before if the
> hosts had a recent DNS or IP address change. You can also check to be
> sure
> there are no firewalls blocking registration.
>
> Thanks;
>
> -jason
>
> On Sep 30, 2013, at 12:07 PM, Lixin Liu
> <>
> wrote:
>
> > Hi Jason,
> >
> > Thank you very much. Indeed these two are on the list now.
> >
> > However, there are a number of hosts previous in the list
> > are disappeared from the list over the weekend. They are:
> >
> > bdw-usask.westgrid.ca
> > bdw-sfu.westgrid.ca
> > bdw-ucalgary.westgrid.ca
> > bdw-ubc.westgrid.ca
> > bdw-uvic.westgrid.ca
> > lat-sfu.westgrid.ca
> > lat-ucalgary.westgrid.ca
> > lat-ubc.westgrid.ca
> > lat-uvic.westgrid.ca
> >
> > Your URL:
> >
> > http://ndb1.internet2.edu:8090/lookup/records?host-name=<hostname>
> >
> > display only [].
> >
> > Should I try to restart ls_registration_daemon on them? Do I need
> > to remove the database?
> >
> > Thanks again.
> >
> > Lixin.
> >
> >
> >> -----Original Message-----
> >> From: Jason Zurawski
> >> [mailto:]
> >> Sent: September-30-13 6:35 AM
> >> To: Lixin Liu
> >> Cc:
> >> ;
> >> Performance Node Users
> >> Subject: Re: [perf-node-users] [perfsonar-user] LSRegistrationDaemon
> >> duplicate checksum lookup/host/ error
> >>
> >> Hi Lixin;
> >>
> >> We were able to track down an operational issue on our end with the
> >> caching
> >> service - as I previously noted your two servers are showing up, so
> >> this
> >> is
> >> not a problem on your end:
> >>
> >> http://ndb1.internet2.edu:8090/lookup/records?host-name=bdw-
> >> umanitoba.westgrid.ca
> >> http://ndb1.internet2.edu:8090/lookup/records?host-name=lat-
> >> umanitoba.westgrid.ca
> >>
> >> They were not being placed into the cache though because, the Internet2
> LS
> >> had a typo in the URL. If you do the following:
> >>
> >> sudo /etc/init.d/ls_cache_daemon restart
> >>
> >> It will re-download the cache file and your hosts should appear in the
> >> Global Listing.
> >>
> >> Thanks;
> >>
> >> -jason
> >>
> >> On Sep 29, 2013, at 12:07 PM, Lixin Liu
> >> <>
> >> wrote:
> >>
> >>> Hi Jason,
> >>>
> >>> I still do not see lat-umanitoba.westgrid.ca and
> >>> bdw-umanitoba.westgrid.ca
> >>> in
> >>> the Global Performance Services, and I can't see them in the
> >>> ComputeCanada
> >>> community. I even do you suggested (stop registration, remove db and
> >>> start
> >>> registration) on bdw-umanitoba yesterday and still unable to see.
> >>>
> >>> Is there anything else I can check?
> >>>
> >>> Thanks,
> >>>
> >>> Lixin.
> >>>
> >>> On 2013-09-27 10:58 AM, "Jason Zurawski"
> >>> <>
> >>> wrote:
> >>>
> >>>> Hi Lixin;
> >>>>
> >>>> Thanks for the clarification, I understand your specific question
> >>>> now.
> >>>> There are a couple of delays, let me try to explain them:
> >>>>
> >>>> - Delay between when the LS daemon runs on the local server, and
> >>>> talks
> >>>> to the remote server
> >>>> - Delay between when the remote server talks to a caching service (an
> >>>> optimization we added - used to generate the GUI information for
> >>>> 'Global
> >>>> Performance Services') and creates a new file for everyone to
> >>>> download
> >>>> - Delay between when the local server downloads the latest copy of
> >>>> the
> >>>> cached content to populate the GUI
> >>>>
> >>>> In general the answer is that it takes a couple of hours for this all
> >>>> to
> >>>> happen, as little as 2-3 if the timing is right, as many as 6-8 if it
> >>>> is
> >>>> wrong. Since we know that your host is in the directory via that
> >>>> REST
> >>>> query that was shown below, there is nothing to worry about. It will
> >>>> just take some time to get into the GUI displays, and there isn't
> >>>> much
> >>>> that can be done to speed that up.
> >>>>
> >>>> The SElinux issue shouldn't be the root cause, but I believe we
> >>>> either
> >>>> set this to permissive or disabled.
> >>>>
> >>>> Thanks;
> >>>>
> >>>> -jason
> >>>>
> >>>> On Sep 27, 2013, at 1:40 PM, "Lixin Liu"
> >>>> <>
> >>>> wrote:
> >>>>
> >>>>> Hi Jason,
> >>>>>
> >>>>> Sorry I should mention that the latency host is down right now.
> >>>>> Someone will take a look at the machine itself. Will let you
> >>>>> know when it comes up.
> >>>>>
> >>>>> So looks like the bandwidth host is registered, but how long I
> >>>>> need to wait to see its services listed in "Global Performance
> >>>>> Services"?
> >>>>>
> >>>>> I noticed these two sites has one thing in common: SELinux is
> >>>>> enabled. I disabled it, but not sure if that is root cause of
> >>>>> our service registration issue.
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Lixin.
> >>>>>
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Jason Zurawski
> >>>>>> [mailto:]
> >>>>>> Sent: September-27-13 10:26 AM
> >>>>>> To: Lixin Liu
> >>>>>> Cc:
> >>>>>> ;
> >>>>>> Performance Node Users
> >>>>>> Subject: Re: [perf-node-users] Re: [perfsonar-user]
> >>>>>> LSRegistrationDaemon
> >>>>>> duplicate checksum lookup/host/ error
> >>>>>>
> >>>>>> Hi Lixin;
> >>>>>>
> >>>>>> Looking in one of the global servers. I do see your bdw host:
> >>>>>>
> >>>>>> http://ndb1.internet2.edu:8090/lookup/records?host-name=bdw-
> >>>>>> umanitoba.westgrid.ca
> >>>>>>
> >>>>>> Unfortunately you are correct in that the latency host has not
> >>>>>> registered,
> >>>>>> I
> >>>>>> don't see that anywhere. Can you send the latest log message for
> >>>>>> lat-
> >>>>>> umanitoba.westgrid.ca again? Everything after the point where you
> >>>>>> did
> >>>>>> the
> >>>>>> removal of the db file and the restart.
> >>>>>>
> >>>>>> Thanks;
> >>>>>>
> >>>>>> -jason
> >>>>>>
> >>>>>> On Sep 27, 2013, at 1:02 PM, "Lixin Liu"
> >>>>>> <>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi Jason,
> >>>>>>>
> >>>>>>> Thanks for your information.
> >>>>>>>
> >>>>>>> I rebooted two hosts (in University of Saskatoon) and they are now
> >>>>>>> showing up in the list of Global Services.
> >>>>>>>
> >>>>>>> I still have issue with two other hosts (in University of
> >>>>>>> Manitoba).
> >>>>>>> I followed your suggestion. It has been more than two hours, but I
> >>>>>>> still do not see this host in the list. However, there was a
> >>>>>>> network
> >>>>>>> (BGP) problem early today that may affect the registration.
> >>>>>>>
> >>>>>>> Here is the log file (hostname bdw-umanitoba.westgrid.ca).
> >>>>>>>
> >>>>>>> The NTP server on the host you mentioned was changed by the local
> >>>>>>> admin
> >>>>>>> to use local NTP servers, but there are only two servers in the
> >>>>>>> config.
> >>>>>>> I added two more and should be fine now.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>>
> >>>>>>> Lixin.
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Jason Zurawski
> >>>>>>>> [mailto:]
> >>>>>>>> Sent: September-27-13 6:19 AM
> >>>>>>>> To: Lixin Liu
> >>>>>>>> Cc:
> >>>>>>>> ;
> >>>>>>>> Performance Node Users
> >>>>>>>> Subject: Re: [perf-node-users] Re: [perfsonar-user]
> >>>>>>>> LSRegistrationDaemon
> >>>>>>>> duplicate checksum lookup/host/ error
> >>>>>>>>
> >>>>>>>> And naturally I meant "rm -f
> >>>>>>>> /var/lib/perfsonar/ls_registration_daemon/lsKey.db" for the cache
> >>>>>>>> file Š
> >>>>>>>>
> >>>>>>>> Thanks;
> >>>>>>>>
> >>>>>>>> -jason
> >>>>>>>>
> >>>>>>>> On Sep 27, 2013, at 9:17 AM, Jason Zurawski
> >>>>>>>> <>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Lixin;
> >>>>>>>>>
> >>>>>>>>> In looking at the logs, we don't see anything out of the
> >>>>>>>>> ordinary,
> >>>>>>>>> the
> >>>>>>>> 'duplicate' message you see is actually just unfortunate wording
> >>>>>>>> on
> >>>>>>>> our
> >>>>>>>> part
> >>>>>>>> - it means 'renew' the registration instead of making a new one.
> >>>>>>>> Since
> >>>>>>>> you
> >>>>>>>> noted you changed DNS/IP, it may be an issue of a stake cache.
> >>>>>>>> Try
> >>>>>>>> the
> >>>>>>>> following steps:
> >>>>>>>>>
> >>>>>>>>>> sudo /etc/init.d/ls_registration_daemon stop
> >>>>>>>>>> /var/lib/perfsonar/ls_registration_daemon/lsKey.db
> >>>>>>>>>> sudo /etc/init.d/ls_registration_daemon start
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> This will stop the service, delete the local db, and then start
> >>>>>>>>> it
> >>>>>>>>> all
> >>>>>>>> over. After a couple of hours the information should show up, or
> >>>>>>>> at
> >>>>>>>> a
> >>>>>>>> minimum the logs will let us know if there is something else
> >>>>>>>> locally
> >>>>>>>> bad
> >>>>>>>> that is going on.
> >>>>>>>>>
> >>>>>>>>> As an aside, it appears that your NTP is not configured, some of
> >>>>>>>>> the
> >>>>>>>>> tools
> >>>>>>>> may not work until that is fixed. You may want to run the NTP
> >>>>>>>> configuration
> >>>>>>>> script again.
> >>>>>>>>>
> >>>>>>>>> Thanks;
> >>>>>>>>>
> >>>>>>>>> -jason
> >>>>>>>>>
> >>>>>>>>> On Sep 26, 2013, at 3:43 PM, "Lixin Liu"
> >>>>>>>>> <>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Sorry I didn¹t realize that the link is password protected and
> >>>>>>>>>> I
> >>>>>>>>>> already logged as admin.
> >>>>>>>>>>
> >>>>>>>>>> Here is the log.
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>>
> >>>>>>>>>> Lixin.
> >>>>>>>>>>
> >>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>> From:
> >>>>>>>>>>>
> >>>>>>>>>>> [
> >>>>>>>>>>> ]
> >>>>>>>>>>> On Behalf Of Lixin Liu
> >>>>>>>>>>> Sent: September-26-13 12:36 PM
> >>>>>>>>>>> To:
> >>>>>>>>>>>
> >>>>>>>>>>> Subject: [perfsonar-user] LSRegistrationDaemon duplicate
> >>>>>>>>>>> checksum
> >>>>>>>>>>> lookup/host/ error
> >>>>>>>>>>>
> >>>>>>>>>>> Hi,
> >>>>>>>>>>>
> >>>>>>>>>>> On a number of our perfSONAR hosts, we get
> >>>>>>>>>>>
> >>>>>>>>>>> perfSONAR_PS::LSRegistrationDaemon::Base::find_duplicate -
> >>>>>>>>>>> Found
> >>>>>>>> duplicate
> >>>>>>>>>>> checksum lookup/host
> >>>>>>>>>>>
> >>>>>>>>>>> error and hosts do not show up in "Global Service and Data
> >>>>>>>>>>> View".
> >>>>>>>>>>> These cases may be related to the IP and DNS changes. But I
> >>>>>>>>>>> think
> >>>>>>>>>>> hosts are configured correctly.
> >>>>>>>>>>>
> >>>>>>>>>>> How do I correct the problem?
> >>>>>>>>>>>
> >>>>>>>>>>> You can see the log from this link:
> >>>>>>>>>>>
> >>>>>>>>>>> https://lat-
> >>>>>>>>>>>
> >>>>>>>>>>>
> >> umanitoba.westgrid.ca/toolkit/admin/logs/ls_registration_daemon.log
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>>
> >>>>>>>>>>> Lixin.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> =======================
> >>>>>>>>>>> Lixin Liu
> >>>>>>>>>>> IT Services
> >>>>>>>>>>> Simon Fraser University
> >>>>>>>>>>
> >>>>>>>>>> <ls_registration_daemon.log>
> >>>>>>> <ls_registration_daemon.log>
> >>>
> >>> <ls_registration_daemon.log>
- Re: [perf-node-users] Re: [perfsonar-user] LSRegistrationDaemon duplicate checksum lookup/host/ error, (continued)
- Re: [perf-node-users] Re: [perfsonar-user] LSRegistrationDaemon duplicate checksum lookup/host/ error, Jason Zurawski, 09/27/2013
- Re: [perf-node-users] Re: [perfsonar-user] LSRegistrationDaemon duplicate checksum lookup/host/ error, Jason Zurawski, 09/27/2013
- RE: [perf-node-users] Re: [perfsonar-user] LSRegistrationDaemon duplicate checksum lookup/host/ error, Lixin Liu, 09/27/2013
- Re: [perf-node-users] Re: [perfsonar-user] LSRegistrationDaemon duplicate checksum lookup/host/ error, Jason Zurawski, 09/27/2013
- RE: [perf-node-users] Re: [perfsonar-user] LSRegistrationDaemon duplicate checksum lookup/host/ error, Lixin Liu, 09/27/2013
- Re: [perf-node-users] Re: [perfsonar-user] LSRegistrationDaemon duplicate checksum lookup/host/ error, Lixin Liu, 09/29/2013
- Re: [perf-node-users] [perfsonar-user] LSRegistrationDaemon duplicate checksum lookup/host/ error, Jason Zurawski, 09/30/2013
- RE: [perf-node-users] [perfsonar-user] LSRegistrationDaemon duplicate checksum lookup/host/ error, Lixin Liu, 09/30/2013
- Re: [perf-node-users] [perfsonar-user] LSRegistrationDaemon duplicate checksum lookup/host/ error, Jason Zurawski, 09/30/2013
- RE: [perf-node-users] [perfsonar-user] LSRegistrationDaemon duplicate checksum lookup/host/ error, Lixin Liu, 09/30/2013
- RE: [perf-node-users] [perfsonar-user] LSRegistrationDaemon duplicate checksum lookup/host/ error, Lixin Liu, 09/30/2013
- Re: [perf-node-users] [perfsonar-user] LSRegistrationDaemon duplicate checksum lookup/host/ error, Jason Zurawski, 09/30/2013
- RE: [perf-node-users] [perfsonar-user] LSRegistrationDaemon duplicate checksum lookup/host/ error, Lixin Liu, 09/30/2013
- Re: [perf-node-users] Re: [perfsonar-user] LSRegistrationDaemon duplicate checksum lookup/host/ error, Jason Zurawski, 09/27/2013
Archive powered by MHonArc 2.6.16.