Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] Help to debug non-working perfsonar boxen

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] Help to debug non-working perfsonar boxen


Chronological Thread 
  • From: Shawn McKee <>
  • To: , Soichi Hayashi <>, Thomas Lee <>
  • Cc: perfsonar-user <>, Alessandra Forti <>, "''" <>, Marian Babik <>
  • Subject: Re: [perfsonar-user] Help to debug non-working perfsonar boxen
  • Date: Mon, 15 Dec 2014 11:22:57 -0500

Hi Winnie,

These boxes are part of the WLCG install.  We have documentation available here https://twiki.opensciencegrid.org/bin/view/Documentation/DeployperfSONAR

I am CCing the cloud responsibles for the UK cloud Alessandra Forti <> and Duncan Rand <>.

Your site is registered and the "auto-mesh" URL works (try opening https://myosg.grid.iu.edu/pfmesh/mine/hostname/lcgnetmon.phy.bris.ac.uk in a browser.  Having a JSON parser makes it more readable)

The problem you are seeing is an issue with IPv6 access to the mesh-configuration information.    The perl client is not doing the right thing and we have an issue open on this at https://code.google.com/p/perfsonar-ps/issues/detail?id=1013 

I thought we had this temporarily fixed by decommissioning the AAAA record on myosg.grid.iu.edu.  CCing Soichi and Thomas in case the IPv6 removal for vip-myosg.grid.iu.edu was reverted.

Winnie, can you try this from your perfSONAR host:


And let me know what it returns?    Once we get the mesh URL access fixed we can address any remaining confguration issues.

Thanks,

Shawn



On Mon, Dec 15, 2014 at 10:56 AM, Winnie Lacesso <> wrote:
Good afternoon,

I inherited 2 perfsonar boxen. They were working but I have been
notified (ggus ticket 110365) they they have broken & am seeking
troubleshooting advice / pointers.

http://www.perfsonar.net/deploy/troubleshooting/
"Section under construction."  :(
No help there.

TBH broken = partly my bad,someone told me to run
/opt/perfsonar_ps/mesh_config/bin/generate_configuration
but they didn't say to run it as perfsonar not root!
So am wondering if running that as root has badly damaged something that
running it properly later on can't fix.
In old email I stumbled across the right way:

sudo -u perfsonar /opt/perfsonar_ps/mesh_config/bin/generate_configuration --verbose

The tail end of that is ending with

2014/12/15 08:41:48 (3009) DEBUG> HTTPS.pm:65 perfSONAR_PS::Utils::HTTPS::https_get - Connecting to: myosg.grid.iu.edu: 443
2014/12/15 08:42:51 (3009) DEBUG> HTTPS.pm:118 perfSONAR_PS::Utils::HTTPS::https_get - Problem retrieving https://myosg.grid.iu.edu/pfmesh/mine/hostname/lcgnetmon.phy.bris.ac.uk: IO::Socket::INET6 configuration failederror:00000000:lib(0):func(0):reason(0)
2014/12/15 08:42:51 (3009) DEBUG> Utils.pm:229 perfSONAR_PS::MeshConfig::Utils::__load_json - Problem retrieving mesh configuration from https://myosg.grid.iu.edu/pfmesh/mine/hostname/lcgnetmon.phy.bris.ac.uk: Problem retrieving https://myosg.grid.iu.edu/pfmesh/mine/hostname/lcgnetmon.phy.bris.ac.uk: IO::Socket::INET6 configuration failederror:00000000:lib(0):func(0):reason(0)
2014/12/15 08:42:51 (3009) ERROR> Agent.pm:292 perfSONAR_PS::MeshConfig::Agent::__configure_host - Problem with mesh configuration: Problem retrieving https://myosg.grid.iu.edu/pfmesh/mine/hostname/lcgnetmon.phy.bris.ac.uk: IO::Socket::INET6 configuration failederror:00000000:lib(0):func(0):reason(0)
2014/12/15 08:42:51 (3009) ERROR> Agent.pm:405 perfSONAR_PS::MeshConfig::Agent::__configure_host - Problem with required meshes, not changing configuration
2014/12/15 08:42:51 (3009) DEBUG> Agent.pm:137 perfSONAR_PS::MeshConfig::Agent::__send_error_messages - No email address to send error message to: Problem with mesh configuration: Problem retrieving https://myosg.grid.iu.edu/pfmesh/mine/hostname/lcgnetmon.phy.bris.ac.uk: IO::Socket::INET6 configuration failederror:00000000:lib(0):func(0):reason(0)
2014/12/15 08:42:51 (3009) DEBUG> Agent.pm:137 perfSONAR_PS::MeshConfig::Agent::__send_error_messages - No email address to send error message to: Problem with required meshes, not changing configuration


The mesh config :

root@lcgnetmon> grep -v \# /opt/perfsonar_ps/mesh_config/etc/agent_configuration.conf | uniq

   <mesh>
               configuration_url https://myosg.grid.iu.edu/pfmesh/mine/hostname/lcgnetmon.phy.bris.ac.uk
               validate_certificate 0
               required 1
   </mesh>

restart_services             1

use_toolkit                  1

send_error_emails             1

address   lcgnetmon.phy.bris.ac.uk
admin_email   
skip_redundant_tests   1


Is there something wrong with that?

I know little about perfsonar & will read the docs but they seem to be
i) install ii) config iii) it just works.
Whereas these are not working boxen.

Lots of errors in logfiles:

root@lcgnetmon> cd /var/log/perfsonar; /bin/ls -lFt | head
total 6772604
-rw-r--r--  1 perfsonar perfsonar    721791 Dec 15 15:47 perfsonarbuoy_ma.log
-rw-r--r--  1 perfsonar perfsonar   1509606 Dec 15 15:47 regular_testing.log
-rw-r--r--  1 perfsonar perfsonar  16145917 Dec 15 15:47 traceroute_ondemand_mp.log
-rw-r--r--  1 perfsonar perfsonar   1046006 Dec 15 15:47 pinger.log
drwxr-xr-x. 2 apache    perfsonar      4096 Dec 15 15:47 web_admin/
-rw-r--r--  1 root      root           1573 Dec 15 15:00 service_watcher_error.log
-rw-r--r--  1 perfsonar perfsonar     34223 Dec 15 11:31 owamp_bwctl.log
-rw-r--r--  1 perfsonar perfsonar       212 Dec 15 08:50 psb_to_esmond.log
-rw-r--r--  1 perfsonar perfsonar  14920093 Dec 15 08:49 traceroute_scheduler.log

root@lcgnetmon> tail -5 perfsonarbuoy_ma.log
2014/12/15 15:46:35 (1931) WARN> daemon.pl:425 main::__ANON__ - Warned: Exiting eval via next at /opt/perfsonar_ps/perfsonarbuoy_ma/bin/../lib/perfSONAR_PS/Utils/MARegistrationManager.pm line 103.
2014/12/15 15:47:41 (1931) WARN> MARegistrationManager.pm:102 perfSONAR_PS::Utils::MARegistrationManager::register - Error trying to lookup administrator lcg-site-admin in LS: 500 Can't connect to sls.geant.net:8090 (connect: timeout)
2014/12/15 15:47:41 (1931) WARN> daemon.pl:425 main::__ANON__ - Warned: Exiting subroutine via next at /opt/perfsonar_ps/perfsonarbuoy_ma/bin/../lib/perfSONAR_PS/Utils/MARegistrationManager.pm line 103.
2014/12/15 15:47:41 (1931) WARN> daemon.pl:425 main::__ANON__ - Warned: Exiting subroutine via next at /opt/perfsonar_ps/perfsonarbuoy_ma/bin/../lib/perfSONAR_PS/Utils/MARegistrationManager.pm line 103.
2014/12/15 15:47:41 (1931) WARN> daemon.pl:425 main::__ANON__ - Warned: Exiting eval via next at /opt/perfsonar_ps/perfsonarbuoy_ma/bin/../lib/perfSONAR_PS/Utils/MARegistrationManager.pm line 103.


root@lcgnetmon> tail -5 regular_testing.log
2014/12/15 15:47:08 (15218) INFO> EsmondBase.pm:56 perfSONAR_PS::RegularTesting::MeasurementArchives::EsmondBase::__ANON__ - Metadata URI: /esmond/perfsonar/archive/7237ac2ac5dc41c7b9711248d1806310/
2014/12/15 15:47:16 (1969) WARN> daemon:103 main::__ANON__ - Warned: IPC::DirQueue: killed stale lockfile: /var/lib/perfsonar/regular_testing/esmond_latency_localhost/active/active/50.20141128152342283489.EMjU1Nw at /usr/share/perl5/IPC/DirQueue.pm line 519.
2014/12/15 15:47:29 (1969) WARN> daemon:103 main::__ANON__ - Warned: IPC::DirQueue: killed stale lockfile: /var/lib/perfsonar/regular_testing/esmond_latency_localhost/active/active/50.20141128152342416486.EMjU1Ng at /usr/share/perl5/IPC/DirQueue.pm line 519.
2014/12/15 15:47:40 (1969) WARN> daemon:103 main::__ANON__ - Warned: IPC::DirQueue: killed stale lockfile: /var/lib/perfsonar/regular_testing/esmond_latency_localhost/active/active/50.20141128152342450834.EMjU1N at /usr/share/perl5/IPC/DirQueue.pm line 519.
2014/12/15 15:47:52 (1969) WARN> daemon:103 main::__ANON__ - Warned: IPC::DirQueue: killed stale lockfile: /var/lib/perfsonar/regular_testing/esmond_latency_localhost/active/active/50.20141128152343286581.EMjU1Nw at /usr/share/perl5/IPC/DirQueue.pm line 519.

root@lcgnetmon> tail -5 traceroute_ondemand_mp.log
2014/12/15 15:46:18 (1685) DEBUG> daemon.pl:569 main::psService - Accept returned nothing, likely a timeout occurred or a child exited
2014/12/15 15:46:38 (1685) DEBUG> daemon.pl:569 main::psService - Accept returned nothing, likely a timeout occurred or a child exited
2014/12/15 15:46:58 (1685) DEBUG> daemon.pl:569 main::psService - Accept returned nothing, likely a timeout occurred or a child exited
2014/12/15 15:47:18 (1685) DEBUG> daemon.pl:569 main::psService - Accept returned nothing, likely a timeout occurred or a child exited

root@lcgnetmon> tail -5 pinger.log
2014/12/15 15:47:04 (1959) ERROR> Remote.pm:479 perfSONAR_PS::Client::LS::Remote::registerStatic - LS cannot be reached, supply alternate or consult gLS.
2014/12/15 15:47:26 (1960) INFO> PingER.pm:300 perfSONAR_PS::Services::MA::PingER::registerLS - Registering PingER MA with LS
2014/12/15 15:47:26 (1960) INFO> PingER.pm:300 perfSONAR_PS::Services::MA::PingER::registerLS - Registering PingER MA with LS
2014/12/15 15:47:27 (1960) ERROR> Remote.pm:352 perfSONAR_PS::Client::LS::Remote::getLS - LS List is emtpty, cannot contact active LS for registration.
2014/12/15 15:47:27 (1960) ERROR> Remote.pm:479 perfSONAR_PS::Client::LS::Remote::registerStatic - LS cannot be reached, supply alternate or consult gLS.

root@lcgnetmon> tail -5 service_watcher_error.log
Can't exec "runlevel": No such file or directory at /opt/perfsonar_ps/toolkit/scripts/../lib/perfSONAR_PS/NPToolkit/Services/Base.pm line 147.
Can't exec "runlevel": No such file or directory at /opt/perfsonar_ps/toolkit/scripts/../lib/perfSONAR_PS/NPToolkit/Services/Base.pm line 147.
Can't exec "runlevel": No such file or directory at /opt/perfsonar_ps/toolkit/scripts/../lib/perfSONAR_PS/NPToolkit/Services/Base.pm line 147.
Can't exec "runlevel": No such file or directory at /opt/perfsonar_ps/toolkit/scripts/../lib/perfSONAR_PS/NPToolkit/Services/Base.pm line 147.
Can't exec "runlevel": No such file or directory at /opt/perfsonar_ps/toolkit/scripts/../lib/perfSONAR_PS/NPToolkit/Services/Base.pm line 147.

root@lcgnetmon> tail -f owamp_bwctl.log
Dec 15 11:30:34 lcgnetmon owampd[18264]: FILE=owampd.c, LINE=724, Control session terminated abnormally...
Dec 15 11:30:37 lcgnetmon owampd[1661]: FILE=policy.c, LINE=1811, ResReq ALLOWED: regular:release:disk = 16 (result = 130628, limit = 1000000000)
Dec 15 11:30:37 lcgnetmon owampd[1661]: FILE=policy.c, LINE=1811, ResReq ALLOWED: regular:release:bandwidth = 3360 (result = 10080, limit = 10000000)
Dec 15 11:30:37 lcgnetmon owampd[18265]: FILE=owampd.c, LINE=724, Control session terminated abnormally...
Dec 15 11:30:40 lcgnetmon owampd[1661]: FILE=policy.c, LINE=1811, ResReq ALLOWED: regular:release:bandwidth = 3360 (result = 6720, limit = 10000000)
Dec 15 11:30:40 lcgnetmon owampd[18055]: FILE=owampd.c, LINE=724, Control session terminated abnormally...
Dec 15 11:30:40 lcgnetmon owampd[1661]: FILE=policy.c, LINE=1811, ResReq ALLOWED: regular:release:bandwidth = 3360 (result = 3360, limit = 10000000)
Dec 15 11:30:40 lcgnetmon owampd[18044]: FILE=owampd.c, LINE=724, Control session terminated abnormally...
Dec 15 11:31:40 lcgnetmon owampd[1661]: FILE=policy.c, LINE=1811, ResReq ALLOWED: regular:release:bandwidth = 3360 (result = 0, limit = 10000000)
Dec 15 11:31:40 lcgnetmon owampd[18057]: FILE=owampd.c, LINE=724, Control session terminated abnormally...

None look healthy!

Also, on another box/browser, trying to get to
http://lcgnetmon02.phy.bris.ac.uk/toolkit/ brings up

Secure Connection Failed
lcgnetmon02.phy.bris.ac.uk uses an invalid security certificate.
The certificate is not trusted because it is self signed.
The certificate is only valid for bfc.phy.bris.ac.uk
The certificate expired on 02/10/14 08:52.
(Error code: sec_error_expired_issuer_certificate)

Found the files all right :
root@lcgnetmon> ll /etc/pki/tls/private/localhost.key /etc/pki/tls/certs/localhost.crt
-rw-------. 1 root root 1204 Oct  2  2013 /etc/pki/tls/certs/localhost.crt
-rw-------. 1 root root  891 Oct  2  2013 /etc/pki/tls/private/localhost.key

root@lcgnetmon> openssl x509 -text -in /etc/pki/tls/certs/localhost.crt|grep -i not
            Not Before: Oct  2 03:49:02 2013 GMT
            Not After : Oct  2 03:49:02 2014 GMT

Confirmed, expired: - where is howto to regenerate them? Or are expired certs
not important?

VERY grateful for pointers & help!

Winnie Lacesso / Bristol University Particle Physics Computing Systems
HH Wills Physics Laboratory, Tyndall Avenue, Bristol, BS8 1TL, UK



Archive powered by MHonArc 2.6.16.

Top of Page