Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] Re: Registration service problem

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] Re: Registration service problem


Chronological Thread 
  • From: Szymon Trocha <>
  • To:
  • Cc:
  • Subject: Re: [perfsonar-user] Re: Registration service problem
  • Date: Tue, 8 Sep 2015 11:35:21 +0200
  • Organization: PCSS

Hi Brian,

W dniu 07.09.2015 o 17:55, Brian Candler pisze:
On 07/09/2015 16:24, Brian Candler wrote:
After KENET have installed some new nodes and filled in the admin info, they are finding the following:

* The list of known communities is not updating. It says in red: "The list of popular communities may be out of date. Ensure that the caching program is correctly running."
* The "local services" tab still says Globally Registered: No
The nodes have propagated in the registration database and also their coodinates, which is good (obviously we just have to wait a while - I now found in the doc it says to wait up to 24 hours)

There are a few problems remaining.

(1) There are actually 5 nodes. However there are two spurious IP address entries:



This is because the reverse DNS for 197.136.31.1 -> PTR pfsnr.maseno-town-pop.k.kenet.or.ke was originally missing.

So the node thought its own hostname was "197.136.31.1", and also showed this under "Host Status - Primary Address" as well - despite the fact that its configured hostname was "pfsnr.maseno-town-pop.k.kenet.or.ke".

Will these two additional hosts and their associated services eventually expire? Or would it require an administrator at ESNet to delete them from the central locator database?

They turned now info full hostname and duplicates started to clear from the list. It usually takes some time after the expire time passes and LSes exchange data


(2) All nodes still show "Globally Registered: No"

(3) The "Popular communities" list still remains empty

If you are using 3.4 this may be related to the bug we identified resulting in such behaviour which is fixed in 3.5.


(4) The ls_registration_daemon.log file is still reporting errors. The most recent is:

...
2015/09/07 18:48:41 (9293) INFO> Base.pm:500 perfSONAR_PS::LSRegistrationDaemon::Base::build_duplicate_checksum - Duplicate checksum is wy12v9mS/LfnlnLOFw031Q
2015/09/07 18:48:41 (9293) INFO> Base.pm:396 perfSONAR_PS::LSRegistrationDaemon::Base::find_duplicate - Found duplicate checksum lookup/person/540ff47f-3907-44f2-903c-79e4acbbaeb2 with wy12v9mS/LfnlnLOFw031Q for KENET

Let's wait until LSes are consistent but no harm here with this message as this is probably related to the hostname/IP change

2015/09/07 18:48:42 (9293) ERROR> Base.pm:304 perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering service. Will retry full registration next time: 500 Internal Error

In fact, over the course of the day, a bunch of different errors have been seen:

[root@pfsnr ~]# grep "Problem regist" /var/log/perfsonar/ls_registration_daemon.log | cut -c28- | sort | uniq -c
    117 ERROR> Base.pm:304 perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering service. Will retry full registration next time: 403 Forbidden
      1 ERROR> Base.pm:304 perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering service. Will retry full registration next time: 500 Can't connect to nsw-brwy-sls1.aarnet.net.au:8090 (connect: timeout)
      2 ERROR> Base.pm:304 perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering service. Will retry full registration next time: 500 Can't connect to ps-sls.sanren.ac.za:8090 (Bad hostname 'ps-sls.sanren.ac.za')

is your DNS service working properly?

     73 ERROR> Base.pm:304 perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering service. Will retry full registration next time: 500 Internal Error
      5 ERROR> Base.pm:304 perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering service. Will retry full registration next time: 500 read timeout

Having seen a mixture of 403 Forbidden, 500 read timeout and 500 Internal Error, I wonder if there is a problem at ps-sls.sanren.ac.za ?

Can you verify network connection to this host work properly? Any loses on the path?


Regards,

Brian.


Regards,
-- 
Szymon Trocha

Poznań Supercomputing & Netw. Center ::: NETWORK OPERATION CENTER
Tel. +48 618582022 ::: http://noc.man.poznan.pl



Archive powered by MHonArc 2.6.16.

Top of Page