Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] Re: Registration service problem

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] Re: Registration service problem


Chronological Thread 
  • From: "Andrew Lake" <>
  • To: "Brian Candler" <>
  • Cc: "Szymon Trocha" <>,
  • Subject: Re: [perfsonar-user] Re: Registration service problem
  • Date: Tue, 08 Sep 2015 12:55:56 -0700 (PDT)

Hi Brian,

I think the "500 timeout” is just a normal timeout, nothing from the server. I think 500 just got stuck in there because an eror code is expected in an object returned by that part of the code. Confusing for sure. 

I have never seen that "* additional stuff not fine transfer.c” before and I don’t get it when I run curl -v on my local host. Maybe a different version of curl? or is there a web proxy somewhere in the path doing weird stuff to requests? If that’s actually in the response body I am guessing the client will get confused. 

Also the 403s you mentioned a few emails ago i wouldn’t stress over too much. Those sometimes happen if things are changing and there are still old records in the way. They age out within the hour and things should bounce back to normal. 

Hope that helps,
Andy







On Tue, Sep 8, 2015 at 3:44 PM, Brian Candler <> wrote:

On 08/09/2015 21:47, Brian Candler wrote:
> I can try pointing to a different locator, but first is there anyone
> who can check logs on ps-sls.sanren.ac.za to see if the locator
> service is having problems? What might cause the 403 / 500 errors?
Actually, without changing anything on this side, I see at least one
node is now trying to register to Australia - but is also getting
various errors.

[root@pfsnr ~]# grep ": [0-9][0-9][0-9] "
/var/log/perfsonar/ls_registration_daemon.log | tail
2015/09/08 21:46:19 (8675) ERROR> Base.pm:304
perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering
service. Will retry full registration next time: 500 read timeout
2015/09/08 21:47:19 (8675) ERROR> Base.pm:304
perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering
service. Will retry full registration next time: 500 Can't connect to
nsw-brwy-sls1.aarnet.net.au:8090 (connect: timeout)
2015/09/08 21:48:21 (8675) ERROR> Base.pm:304
perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering
service. Will retry full registration next time: 500 read timeout
2015/09/08 21:49:24 (8675) ERROR> Base.pm:304
perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering
service. Will retry full registration next time: 500 read timeout
2015/09/08 21:50:26 (8675) ERROR> Base.pm:304
perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering
service. Will retry full registration next time: 500 read timeout
2015/09/08 21:51:31 (8675) ERROR> Base.pm:304
perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering
service. Will retry full registration next time: 500 read timeout
2015/09/08 21:52:36 (8675) ERROR> Base.pm:304
perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering
service. Will retry full registration next time: 500 read timeout
2015/09/08 21:53:39 (8675) ERROR> Base.pm:304
perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering
service. Will retry full registration next time: 500 read timeout
2015/09/08 21:54:46 (8675) ERROR> Base.pm:304
perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering
service. Will retry full registration next time: 500 read timeout
2015/09/08 21:55:51 (8675) ERROR> Base.pm:304
perfSONAR_PS::LSRegistrationDaemon::Base::register - Problem registering
service. Will retry full registration next time: 500 Can't connect to
nsw-brwy-sls1.aarnet.net.au:8090 (connect: timeout)

It's not clear to me whether "500 read timeout" is a locally-generated
error, or an actual 500 HTTP error. But the "connect: timeout" looks
like it could be locally generated.

However this node does look like it's having problems.

[root@pfsnr ~]# ping -c10 nsw-brwy-sls1.aarnet.net.au
PING nsw-brwy-sls1.aarnet.net.au (182.255.120.9) 56(84) bytes of data.
64 bytes from nsw-brwy-sls1.aarnet.net.au (182.255.120.9): icmp_seq=1
ttl=48 time=445 ms
64 bytes from nsw-brwy-sls1.aarnet.net.au (182.255.120.9): icmp_seq=2
ttl=48 time=445 ms
64 bytes from nsw-brwy-sls1.aarnet.net.au (182.255.120.9): icmp_seq=3
ttl=48 time=445 ms
64 bytes from nsw-brwy-sls1.aarnet.net.au (182.255.120.9): icmp_seq=4
ttl=48 time=445 ms
64 bytes from nsw-brwy-sls1.aarnet.net.au (182.255.120.9): icmp_seq=5
ttl=48 time=445 ms
64 bytes from nsw-brwy-sls1.aarnet.net.au (182.255.120.9): icmp_seq=6
ttl=48 time=445 ms
64 bytes from nsw-brwy-sls1.aarnet.net.au (182.255.120.9): icmp_seq=7
ttl=48 time=445 ms
64 bytes from nsw-brwy-sls1.aarnet.net.au (182.255.120.9): icmp_seq=8
ttl=48 time=445 ms
64 bytes from nsw-brwy-sls1.aarnet.net.au (182.255.120.9): icmp_seq=9
ttl=48 time=445 ms
64 bytes from nsw-brwy-sls1.aarnet.net.au (182.255.120.9): icmp_seq=10
ttl=48 time=445 ms

--- nsw-brwy-sls1.aarnet.net.au ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9510ms
rtt min/avg/max/mdev = 445.548/445.694/445.938/0.438 ms
[root@pfsnr ~]# curl -v nsw-brwy-sls1.aarnet.net.au:8090
* About to connect() to nsw-brwy-sls1.aarnet.net.au port 8090 (#0)
* Trying 182.255.120.9...
<< hangs for 10-20 seconds >>
connected
* Connected to nsw-brwy-sls1.aarnet.net.au (182.255.120.9) port 8090 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7
NSS/3.19.1 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2
> Host: nsw-brwy-sls1.aarnet.net.au:8090
> Accept: */*
>
<< hangs here - indefinitely? >>

Ditto for curl -v http://nsw-brwy-sls1.aarnet.net.au:8090/lookup/records/

If I try this from a host in the UK, I get something very strange which
I've never seen before:

brian@deploy2:~$ time curl -v
http://nsw-brwy-sls1.aarnet.net.au:8090/lookup/records/
* About to connect() to nsw-brwy-sls1.aarnet.net.au port 8090 (#0)
* Trying 182.255.120.9...
* connected
* Connected to nsw-brwy-sls1.aarnet.net.au (182.255.120.9) port 8090 (#0)
> GET /lookup/records/ HTTP/1.1
> User-Agent: curl/7.26.0
> Host: nsw-brwy-sls1.aarnet.net.au:8090
> Accept: */*
>
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
...

(about once per second)

But if I try "curl -v http://ps-west.es.net:8090/lookup/records/"
instead, what I see is a few "additional stuff not fine" lines, followed
by a big splurge of JSON, which I think is the entire locator database.

Back on the perfsonar node which is showing the problem:

[root@pfsnr ~]# grep http /var/log/perfsonar/ls_registration_daemon.log
| grep -v kenet
2015/09/08 12:05:05 (8604) INFO> daemon.pl:170 main:: - Initial LS URL
set to http://nsw-brwy-sls1.aarnet.net.au:8090/lookup/records/
2015/09/08 18:05:48 (8675) INFO> daemon.pl:349 main::handle_site - LS
URL changed to http://ps-sls.sanren.ac.za:8090/lookup/records
[root@pfsnr ~]#

So it looks like it's learned the ZA URL again, but logs show it's still
trying to connect to the AU one.

All very odd!

Regards,

Brian.





Archive powered by MHonArc 2.6.16.

Top of Page