Skip to Content.
Sympa Menu

perfsonar-user - Re: [perf-node-users] [perfsonar-user] LSRegistrationDaemon duplicate checksum lookup/host/ error

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perf-node-users] [perfsonar-user] LSRegistrationDaemon duplicate checksum lookup/host/ error


Chronological Thread 
  • From: Jason Zurawski <>
  • To: Lixin Liu <>
  • Cc: , Performance Node Users <>
  • Subject: Re: [perf-node-users] [perfsonar-user] LSRegistrationDaemon duplicate checksum lookup/host/ error
  • Date: Mon, 30 Sep 2013 09:34:38 -0400

Hi Lixin;

We were able to track down an operational issue on our end with the caching
service - as I previously noted your two servers are showing up, so this is
not a problem on your end:


http://ndb1.internet2.edu:8090/lookup/records?host-name=bdw-umanitoba.westgrid.ca

http://ndb1.internet2.edu:8090/lookup/records?host-name=lat-umanitoba.westgrid.ca

They were not being placed into the cache though because, the Internet2 LS
had a typo in the URL. If you do the following:

sudo /etc/init.d/ls_cache_daemon restart

It will re-download the cache file and your hosts should appear in the Global
Listing.

Thanks;

-jason

On Sep 29, 2013, at 12:07 PM, Lixin Liu
<>
wrote:

> Hi Jason,
>
> I still do not see lat-umanitoba.westgrid.ca and bdw-umanitoba.westgrid.ca
> in
> the Global Performance Services, and I can't see them in the ComputeCanada
> community. I even do you suggested (stop registration, remove db and start
> registration) on bdw-umanitoba yesterday and still unable to see.
>
> Is there anything else I can check?
>
> Thanks,
>
> Lixin.
>
> On 2013-09-27 10:58 AM, "Jason Zurawski"
> <>
> wrote:
>
>> Hi Lixin;
>>
>> Thanks for the clarification, I understand your specific question now.
>> There are a couple of delays, let me try to explain them:
>>
>> - Delay between when the LS daemon runs on the local server, and talks
>> to the remote server
>> - Delay between when the remote server talks to a caching service (an
>> optimization we added - used to generate the GUI information for 'Global
>> Performance Services') and creates a new file for everyone to download
>> - Delay between when the local server downloads the latest copy of the
>> cached content to populate the GUI
>>
>> In general the answer is that it takes a couple of hours for this all to
>> happen, as little as 2-3 if the timing is right, as many as 6-8 if it is
>> wrong. Since we know that your host is in the directory via that REST
>> query that was shown below, there is nothing to worry about. It will
>> just take some time to get into the GUI displays, and there isn't much
>> that can be done to speed that up.
>>
>> The SElinux issue shouldn't be the root cause, but I believe we either
>> set this to permissive or disabled.
>>
>> Thanks;
>>
>> -jason
>>
>> On Sep 27, 2013, at 1:40 PM, "Lixin Liu"
>> <>
>> wrote:
>>
>>> Hi Jason,
>>>
>>> Sorry I should mention that the latency host is down right now.
>>> Someone will take a look at the machine itself. Will let you
>>> know when it comes up.
>>>
>>> So looks like the bandwidth host is registered, but how long I
>>> need to wait to see its services listed in "Global Performance
>>> Services"?
>>>
>>> I noticed these two sites has one thing in common: SELinux is
>>> enabled. I disabled it, but not sure if that is root cause of
>>> our service registration issue.
>>>
>>> Thanks,
>>>
>>> Lixin.
>>>
>>>
>>>> -----Original Message-----
>>>> From: Jason Zurawski
>>>> [mailto:]
>>>> Sent: September-27-13 10:26 AM
>>>> To: Lixin Liu
>>>> Cc:
>>>> ;
>>>> Performance Node Users
>>>> Subject: Re: [perf-node-users] Re: [perfsonar-user]
>>>> LSRegistrationDaemon
>>>> duplicate checksum lookup/host/ error
>>>>
>>>> Hi Lixin;
>>>>
>>>> Looking in one of the global servers. I do see your bdw host:
>>>>
>>>> http://ndb1.internet2.edu:8090/lookup/records?host-name=bdw-
>>>> umanitoba.westgrid.ca
>>>>
>>>> Unfortunately you are correct in that the latency host has not
>>>> registered,
>>>> I
>>>> don't see that anywhere. Can you send the latest log message for lat-
>>>> umanitoba.westgrid.ca again? Everything after the point where you did
>>>> the
>>>> removal of the db file and the restart.
>>>>
>>>> Thanks;
>>>>
>>>> -jason
>>>>
>>>> On Sep 27, 2013, at 1:02 PM, "Lixin Liu"
>>>> <>
>>>> wrote:
>>>>
>>>>> Hi Jason,
>>>>>
>>>>> Thanks for your information.
>>>>>
>>>>> I rebooted two hosts (in University of Saskatoon) and they are now
>>>>> showing up in the list of Global Services.
>>>>>
>>>>> I still have issue with two other hosts (in University of Manitoba).
>>>>> I followed your suggestion. It has been more than two hours, but I
>>>>> still do not see this host in the list. However, there was a network
>>>>> (BGP) problem early today that may affect the registration.
>>>>>
>>>>> Here is the log file (hostname bdw-umanitoba.westgrid.ca).
>>>>>
>>>>> The NTP server on the host you mentioned was changed by the local
>>>>> admin
>>>>> to use local NTP servers, but there are only two servers in the
>>>>> config.
>>>>> I added two more and should be fine now.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Lixin.
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Jason Zurawski
>>>>>> [mailto:]
>>>>>> Sent: September-27-13 6:19 AM
>>>>>> To: Lixin Liu
>>>>>> Cc:
>>>>>> ;
>>>>>> Performance Node Users
>>>>>> Subject: Re: [perf-node-users] Re: [perfsonar-user]
>>>>>> LSRegistrationDaemon
>>>>>> duplicate checksum lookup/host/ error
>>>>>>
>>>>>> And naturally I meant "rm -f
>>>>>> /var/lib/perfsonar/ls_registration_daemon/lsKey.db" for the cache
>>>>>> file Š
>>>>>>
>>>>>> Thanks;
>>>>>>
>>>>>> -jason
>>>>>>
>>>>>> On Sep 27, 2013, at 9:17 AM, Jason Zurawski
>>>>>> <>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Lixin;
>>>>>>>
>>>>>>> In looking at the logs, we don't see anything out of the ordinary,
>>>>>>> the
>>>>>> 'duplicate' message you see is actually just unfortunate wording on
>>>>>> our
>>>>>> part
>>>>>> - it means 'renew' the registration instead of making a new one.
>>>>>> Since
>>>>>> you
>>>>>> noted you changed DNS/IP, it may be an issue of a stake cache. Try
>>>>>> the
>>>>>> following steps:
>>>>>>>
>>>>>>>> sudo /etc/init.d/ls_registration_daemon stop
>>>>>>>> /var/lib/perfsonar/ls_registration_daemon/lsKey.db
>>>>>>>> sudo /etc/init.d/ls_registration_daemon start
>>>>>>>
>>>>>>>
>>>>>>> This will stop the service, delete the local db, and then start it
>>>>>>> all
>>>>>> over. After a couple of hours the information should show up, or at
>>>>>> a
>>>>>> minimum the logs will let us know if there is something else locally
>>>>>> bad
>>>>>> that is going on.
>>>>>>>
>>>>>>> As an aside, it appears that your NTP is not configured, some of the
>>>>>>> tools
>>>>>> may not work until that is fixed. You may want to run the NTP
>>>>>> configuration
>>>>>> script again.
>>>>>>>
>>>>>>> Thanks;
>>>>>>>
>>>>>>> -jason
>>>>>>>
>>>>>>> On Sep 26, 2013, at 3:43 PM, "Lixin Liu"
>>>>>>> <>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Sorry I didn¹t realize that the link is password protected and I
>>>>>>>> already logged as admin.
>>>>>>>>
>>>>>>>> Here is the log.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Lixin.
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From:
>>>>>>>>>
>>>>>>>>> [
>>>>>>>>> ]
>>>>>>>>> On Behalf Of Lixin Liu
>>>>>>>>> Sent: September-26-13 12:36 PM
>>>>>>>>> To:
>>>>>>>>>
>>>>>>>>> Subject: [perfsonar-user] LSRegistrationDaemon duplicate checksum
>>>>>>>>> lookup/host/ error
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> On a number of our perfSONAR hosts, we get
>>>>>>>>>
>>>>>>>>> perfSONAR_PS::LSRegistrationDaemon::Base::find_duplicate - Found
>>>>>> duplicate
>>>>>>>>> checksum lookup/host
>>>>>>>>>
>>>>>>>>> error and hosts do not show up in "Global Service and Data View".
>>>>>>>>> These cases may be related to the IP and DNS changes. But I think
>>>>>>>>> hosts are configured correctly.
>>>>>>>>>
>>>>>>>>> How do I correct the problem?
>>>>>>>>>
>>>>>>>>> You can see the log from this link:
>>>>>>>>>
>>>>>>>>> https://lat-
>>>>>>>>>
>>>>>>>>> umanitoba.westgrid.ca/toolkit/admin/logs/ls_registration_daemon.log
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Lixin.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> =======================
>>>>>>>>> Lixin Liu
>>>>>>>>> IT Services
>>>>>>>>> Simon Fraser University
>>>>>>>>
>>>>>>>> <ls_registration_daemon.log>
>>>>> <ls_registration_daemon.log>
>
> <ls_registration_daemon.log>

-----

Jason Zurawski, Science Engagement Engineer
ESnet

office: [+1-510-486-6483]
mobile: [+1-703-981-2494]
http://www.es.net/zurawski

Supercomputing Conference (SC13)
November 17 - 22, 2013, Denver, CO
http://sc13.supercomputing.org





Archive powered by MHonArc 2.6.16.

Top of Page