Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] Re: Registration service problem

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] Re: Registration service problem


Chronological Thread 
  • From: Brian Candler <>
  • To: Szymon Trocha <>
  • Cc:
  • Subject: Re: [perfsonar-user] Re: Registration service problem
  • Date: Tue, 8 Sep 2015 22:43:56 +0300
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=pobox.com; h=subject:to :references:cc:from:message-id:date:mime-version:in-reply-to :content-type:content-transfer-encoding; q=dns; s=sasl; b=M3nkJA fLWnko7ANEPKE8DX5ihXu9l0PvdekQzawzJWJkOCWXttDniE/pVWZjNnT/THMIAZ NXpStRCnMCCoAB+j3Ct/Q1d71wp2rtSV+WNBDZUApOJ4f3kvwox4c8xCIC+9rpZo +Zh4IZdg7KKAb7aU/g274dS+EzutFXHmJ38Bg=

On 08/09/2015 21:47, Brian Candler wrote:
I can try pointing to a different locator, but first is there anyone who can check logs on ps-sls.sanren.ac.za to see if the locator service is having problems? What might cause the 403 / 500 errors?
Actually, without changing anything on this side, I see at least one node is now trying to register to Australia - but is also getting various errors.

[root@pfsnr ~]# grep ": [0-9][0-9][0-9] " /var/log/perfsonar/ls_registration_daemon.log | tail
2015/09/08 21:46:19 (8675) ERROR> Base.pm:304 perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering service. Will retry full registration next time: 500 read timeout
2015/09/08 21:47:19 (8675) ERROR> Base.pm:304 perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering service. Will retry full registration next time: 500 Can't connect to nsw-brwy-sls1.aarnet.net.au:8090 (connect: timeout)
2015/09/08 21:48:21 (8675) ERROR> Base.pm:304 perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering service. Will retry full registration next time: 500 read timeout
2015/09/08 21:49:24 (8675) ERROR> Base.pm:304 perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering service. Will retry full registration next time: 500 read timeout
2015/09/08 21:50:26 (8675) ERROR> Base.pm:304 perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering service. Will retry full registration next time: 500 read timeout
2015/09/08 21:51:31 (8675) ERROR> Base.pm:304 perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering service. Will retry full registration next time: 500 read timeout
2015/09/08 21:52:36 (8675) ERROR> Base.pm:304 perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering service. Will retry full registration next time: 500 read timeout
2015/09/08 21:53:39 (8675) ERROR> Base.pm:304 perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering service. Will retry full registration next time: 500 read timeout
2015/09/08 21:54:46 (8675) ERROR> Base.pm:304 perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering service. Will retry full registration next time: 500 read timeout
2015/09/08 21:55:51 (8675) ERROR> Base.pm:304 perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering service. Will retry full registration next time: 500 Can't connect to nsw-brwy-sls1.aarnet.net.au:8090 (connect: timeout)

It's not clear to me whether "500 read timeout" is a locally-generated error, or an actual 500 HTTP error. But the "connect: timeout" looks like it could be locally generated.

However this node does look like it's having problems.

[root@pfsnr
~]# ping -c10 nsw-brwy-sls1.aarnet.net.au
PING nsw-brwy-sls1.aarnet.net.au (182.255.120.9) 56(84) bytes of data.
64 bytes from nsw-brwy-sls1.aarnet.net.au (182.255.120.9): icmp_seq=1 ttl=48 time=445 ms
64 bytes from nsw-brwy-sls1.aarnet.net.au (182.255.120.9): icmp_seq=2 ttl=48 time=445 ms
64 bytes from nsw-brwy-sls1.aarnet.net.au (182.255.120.9): icmp_seq=3 ttl=48 time=445 ms
64 bytes from nsw-brwy-sls1.aarnet.net.au (182.255.120.9): icmp_seq=4 ttl=48 time=445 ms
64 bytes from nsw-brwy-sls1.aarnet.net.au (182.255.120.9): icmp_seq=5 ttl=48 time=445 ms
64 bytes from nsw-brwy-sls1.aarnet.net.au (182.255.120.9): icmp_seq=6 ttl=48 time=445 ms
64 bytes from nsw-brwy-sls1.aarnet.net.au (182.255.120.9): icmp_seq=7 ttl=48 time=445 ms
64 bytes from nsw-brwy-sls1.aarnet.net.au (182.255.120.9): icmp_seq=8 ttl=48 time=445 ms
64 bytes from nsw-brwy-sls1.aarnet.net.au (182.255.120.9): icmp_seq=9 ttl=48 time=445 ms
64 bytes from nsw-brwy-sls1.aarnet.net.au (182.255.120.9): icmp_seq=10 ttl=48 time=445 ms

--- nsw-brwy-sls1.aarnet.net.au ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9510ms
rtt min/avg/max/mdev = 445.548/445.694/445.938/0.438 ms
[root@pfsnr
~]# curl -v nsw-brwy-sls1.aarnet.net.au:8090
* About to connect() to nsw-brwy-sls1.aarnet.net.au port 8090 (#0)
* Trying 182.255.120.9...
<< hangs for 10-20 seconds >>
connected
* Connected to nsw-brwy-sls1.aarnet.net.au (182.255.120.9) port 8090 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.19.1 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2
> Host: nsw-brwy-sls1.aarnet.net.au:8090
> Accept: */*
>
<< hangs here - indefinitely? >>

Ditto for curl -v http://nsw-brwy-sls1.aarnet.net.au:8090/lookup/records/

If I try this from a host in the UK, I get something very strange which I've never seen before:

brian@deploy2:~$ time curl -v http://nsw-brwy-sls1.aarnet.net.au:8090/lookup/records/
* About to connect() to nsw-brwy-sls1.aarnet.net.au port 8090 (#0)
* Trying 182.255.120.9...
* connected
* Connected to nsw-brwy-sls1.aarnet.net.au (182.255.120.9) port 8090 (#0)
> GET /lookup/records/ HTTP/1.1
> User-Agent: curl/7.26.0
> Host: nsw-brwy-sls1.aarnet.net.au:8090
> Accept: */*
>
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
...

(about once per second)

But if I try "curl -v http://ps-west.es.net:8090/lookup/records/"; instead, what I see is a few "additional stuff not fine" lines, followed by a big splurge of JSON, which I think is the entire locator database.

Back on the perfsonar node which is showing the problem:

[root@pfsnr ~]# grep http /var/log/perfsonar/ls_registration_daemon.log | grep -v kenet
2015/09/08 12:05:05 (8604) INFO> daemon.pl:170 main:: - Initial LS URL set to http://nsw-brwy-sls1.aarnet.net.au:8090/lookup/records/
2015/09/08 18:05:48 (8675) INFO> daemon.pl:349 main::handle_site - LS URL changed to http://ps-sls.sanren.ac.za:8090/lookup/records
[root@pfsnr
~]#

So it looks like it's learned the ZA URL again, but logs show it's still trying to connect to the AU one.

All very odd!

Regards,

Brian.




Archive powered by MHonArc 2.6.16.

Top of Page