perfsonar-user - Re: [perfsonar-user] Help to debug non-working perfsonar boxen
Subject: perfSONAR User Q&A and Other Discussion
List archive
- From: Shawn McKee <>
- To: , Soichi Hayashi <>, Thomas Lee <>
- Cc: perfsonar-user <>, Alessandra Forti <>, "''" <>, Marian Babik <>
- Subject: Re: [perfsonar-user] Help to debug non-working perfsonar boxen
- Date: Mon, 15 Dec 2014 11:22:57 -0500
Hi Winnie,
Shawn
These boxes are part of the WLCG install. We have documentation available here https://twiki.opensciencegrid.org/bin/view/Documentation/DeployperfSONAR
I am CCing the cloud responsibles for the UK cloud Alessandra Forti <> and Duncan Rand <>.
Your site is registered and the "auto-mesh" URL works (try opening https://myosg.grid.iu.edu/pfmesh/mine/hostname/lcgnetmon.phy.bris.ac.uk in a browser. Having a JSON parser makes it more readable)
The problem you are seeing is an issue with IPv6 access to the mesh-configuration information. The perl client is not doing the right thing and we have an issue open on this at https://code.google.com/p/perfsonar-ps/issues/detail?id=1013
I thought we had this temporarily fixed by decommissioning the AAAA record on myosg.grid.iu.edu. CCing Soichi and Thomas in case the IPv6 removal for vip-myosg.grid.iu.edu was reverted.
Winnie, can you try this from your perfSONAR host:
dig AAAA myosg.grid.iu.edu
And let me know what it returns? Once we get the mesh URL access fixed we can address any remaining confguration issues.
Thanks,
Shawn
On Mon, Dec 15, 2014 at 10:56 AM, Winnie Lacesso <> wrote:
Good afternoon,
I inherited 2 perfsonar boxen. They were working but I have been
notified (ggus ticket 110365) they they have broken & am seeking
troubleshooting advice / pointers.
http://www.perfsonar.net/deploy/troubleshooting/
"Section under construction." :(
No help there.
TBH broken = partly my bad,someone told me to run
/opt/perfsonar_ps/mesh_config/bin/generate_configuration
but they didn't say to run it as perfsonar not root!
So am wondering if running that as root has badly damaged something that
running it properly later on can't fix.
In old email I stumbled across the right way:
sudo -u perfsonar /opt/perfsonar_ps/mesh_config/bin/generate_configuration --verbose
The tail end of that is ending with
2014/12/15 08:41:48 (3009) DEBUG> HTTPS.pm:65 perfSONAR_PS::Utils::HTTPS::https_get - Connecting to: myosg.grid.iu.edu: 443
2014/12/15 08:42:51 (3009) DEBUG> HTTPS.pm:118 perfSONAR_PS::Utils::HTTPS::https_get - Problem retrieving https://myosg.grid.iu.edu/pfmesh/mine/hostname/lcgnetmon.phy.bris.ac.uk: IO::Socket::INET6 configuration failederror:00000000:lib(0):func(0):reason(0)
2014/12/15 08:42:51 (3009) DEBUG> Utils.pm:229 perfSONAR_PS::MeshConfig::Utils::__load_json - Problem retrieving mesh configuration from https://myosg.grid.iu.edu/pfmesh/mine/hostname/lcgnetmon.phy.bris.ac.uk: Problem retrieving https://myosg.grid.iu.edu/pfmesh/mine/hostname/lcgnetmon.phy.bris.ac.uk: IO::Socket::INET6 configuration failederror:00000000:lib(0):func(0):reason(0)
2014/12/15 08:42:51 (3009) ERROR> Agent.pm:292 perfSONAR_PS::MeshConfig::Agent::__configure_host - Problem with mesh configuration: Problem retrieving https://myosg.grid.iu.edu/pfmesh/mine/hostname/lcgnetmon.phy.bris.ac.uk: IO::Socket::INET6 configuration failederror:00000000:lib(0):func(0):reason(0)
2014/12/15 08:42:51 (3009) ERROR> Agent.pm:405 perfSONAR_PS::MeshConfig::Agent::__configure_host - Problem with required meshes, not changing configuration
2014/12/15 08:42:51 (3009) DEBUG> Agent.pm:137 perfSONAR_PS::MeshConfig::Agent::__send_error_messages - No email address to send error message to: Problem with mesh configuration: Problem retrieving https://myosg.grid.iu.edu/pfmesh/mine/hostname/lcgnetmon.phy.bris.ac.uk: IO::Socket::INET6 configuration failederror:00000000:lib(0):func(0):reason(0)
2014/12/15 08:42:51 (3009) DEBUG> Agent.pm:137 perfSONAR_PS::MeshConfig::Agent::__send_error_messages - No email address to send error message to: Problem with required meshes, not changing configuration
The mesh config :
root@lcgnetmon> grep -v \# /opt/perfsonar_ps/mesh_config/etc/agent_configuration.conf | uniq
<mesh>
configuration_url https://myosg.grid.iu.edu/pfmesh/mine/hostname/lcgnetmon.phy.bris.ac.uk
validate_certificate 0
required 1
</mesh>
restart_services 1
use_toolkit 1
send_error_emails 1
address lcgnetmon.phy.bris.ac.uk
admin_email
skip_redundant_tests 1
Is there something wrong with that?
I know little about perfsonar & will read the docs but they seem to be
i) install ii) config iii) it just works.
Whereas these are not working boxen.
Lots of errors in logfiles:
root@lcgnetmon> cd /var/log/perfsonar; /bin/ls -lFt | head
total 6772604
-rw-r--r-- 1 perfsonar perfsonar 721791 Dec 15 15:47 perfsonarbuoy_ma.log
-rw-r--r-- 1 perfsonar perfsonar 1509606 Dec 15 15:47 regular_testing.log
-rw-r--r-- 1 perfsonar perfsonar 16145917 Dec 15 15:47 traceroute_ondemand_mp.log
-rw-r--r-- 1 perfsonar perfsonar 1046006 Dec 15 15:47 pinger.log
drwxr-xr-x. 2 apache perfsonar 4096 Dec 15 15:47 web_admin/
-rw-r--r-- 1 root root 1573 Dec 15 15:00 service_watcher_error.log
-rw-r--r-- 1 perfsonar perfsonar 34223 Dec 15 11:31 owamp_bwctl.log
-rw-r--r-- 1 perfsonar perfsonar 212 Dec 15 08:50 psb_to_esmond.log
-rw-r--r-- 1 perfsonar perfsonar 14920093 Dec 15 08:49 traceroute_scheduler.log
root@lcgnetmon> tail -5 perfsonarbuoy_ma.log
2014/12/15 15:46:35 (1931) WARN> daemon.pl:425 main::__ANON__ - Warned: Exiting eval via next at /opt/perfsonar_ps/perfsonarbuoy_ma/bin/../lib/perfSONAR_PS/Utils/MARegistrationManager.pm line 103.
2014/12/15 15:47:41 (1931) WARN> MARegistrationManager.pm:102 perfSONAR_PS::Utils::MARegistrationManager::register - Error trying to lookup administrator lcg-site-admin in LS: 500 Can't connect to sls.geant.net:8090 (connect: timeout)
2014/12/15 15:47:41 (1931) WARN> daemon.pl:425 main::__ANON__ - Warned: Exiting subroutine via next at /opt/perfsonar_ps/perfsonarbuoy_ma/bin/../lib/perfSONAR_PS/Utils/MARegistrationManager.pm line 103.
2014/12/15 15:47:41 (1931) WARN> daemon.pl:425 main::__ANON__ - Warned: Exiting subroutine via next at /opt/perfsonar_ps/perfsonarbuoy_ma/bin/../lib/perfSONAR_PS/Utils/MARegistrationManager.pm line 103.
2014/12/15 15:47:41 (1931) WARN> daemon.pl:425 main::__ANON__ - Warned: Exiting eval via next at /opt/perfsonar_ps/perfsonarbuoy_ma/bin/../lib/perfSONAR_PS/Utils/MARegistrationManager.pm line 103.
root@lcgnetmon> tail -5 regular_testing.log
2014/12/15 15:47:08 (15218) INFO> EsmondBase.pm:56 perfSONAR_PS::RegularTesting::MeasurementArchives::EsmondBase::__ANON__ - Metadata URI: /esmond/perfsonar/archive/7237ac2ac5dc41c7b9711248d1806310/
2014/12/15 15:47:16 (1969) WARN> daemon:103 main::__ANON__ - Warned: IPC::DirQueue: killed stale lockfile: /var/lib/perfsonar/regular_testing/esmond_latency_localhost/active/active/50.20141128152342283489.EMjU1Nw at /usr/share/perl5/IPC/DirQueue.pm line 519.
2014/12/15 15:47:29 (1969) WARN> daemon:103 main::__ANON__ - Warned: IPC::DirQueue: killed stale lockfile: /var/lib/perfsonar/regular_testing/esmond_latency_localhost/active/active/50.20141128152342416486.EMjU1Ng at /usr/share/perl5/IPC/DirQueue.pm line 519.
2014/12/15 15:47:40 (1969) WARN> daemon:103 main::__ANON__ - Warned: IPC::DirQueue: killed stale lockfile: /var/lib/perfsonar/regular_testing/esmond_latency_localhost/active/active/50.20141128152342450834.EMjU1N at /usr/share/perl5/IPC/DirQueue.pm line 519.
2014/12/15 15:47:52 (1969) WARN> daemon:103 main::__ANON__ - Warned: IPC::DirQueue: killed stale lockfile: /var/lib/perfsonar/regular_testing/esmond_latency_localhost/active/active/50.20141128152343286581.EMjU1Nw at /usr/share/perl5/IPC/DirQueue.pm line 519.
root@lcgnetmon> tail -5 traceroute_ondemand_mp.log
2014/12/15 15:46:18 (1685) DEBUG> daemon.pl:569 main::psService - Accept returned nothing, likely a timeout occurred or a child exited
2014/12/15 15:46:38 (1685) DEBUG> daemon.pl:569 main::psService - Accept returned nothing, likely a timeout occurred or a child exited
2014/12/15 15:46:58 (1685) DEBUG> daemon.pl:569 main::psService - Accept returned nothing, likely a timeout occurred or a child exited
2014/12/15 15:47:18 (1685) DEBUG> daemon.pl:569 main::psService - Accept returned nothing, likely a timeout occurred or a child exited
root@lcgnetmon> tail -5 pinger.log
2014/12/15 15:47:04 (1959) ERROR> Remote.pm:479 perfSONAR_PS::Client::LS::Remote::registerStatic - LS cannot be reached, supply alternate or consult gLS.
2014/12/15 15:47:26 (1960) INFO> PingER.pm:300 perfSONAR_PS::Services::MA::PingER::registerLS - Registering PingER MA with LS
2014/12/15 15:47:26 (1960) INFO> PingER.pm:300 perfSONAR_PS::Services::MA::PingER::registerLS - Registering PingER MA with LS
2014/12/15 15:47:27 (1960) ERROR> Remote.pm:352 perfSONAR_PS::Client::LS::Remote::getLS - LS List is emtpty, cannot contact active LS for registration.
2014/12/15 15:47:27 (1960) ERROR> Remote.pm:479 perfSONAR_PS::Client::LS::Remote::registerStatic - LS cannot be reached, supply alternate or consult gLS.
root@lcgnetmon> tail -5 service_watcher_error.log
Can't exec "runlevel": No such file or directory at /opt/perfsonar_ps/toolkit/scripts/../lib/perfSONAR_PS/NPToolkit/Services/Base.pm line 147.
Can't exec "runlevel": No such file or directory at /opt/perfsonar_ps/toolkit/scripts/../lib/perfSONAR_PS/NPToolkit/Services/Base.pm line 147.
Can't exec "runlevel": No such file or directory at /opt/perfsonar_ps/toolkit/scripts/../lib/perfSONAR_PS/NPToolkit/Services/Base.pm line 147.
Can't exec "runlevel": No such file or directory at /opt/perfsonar_ps/toolkit/scripts/../lib/perfSONAR_PS/NPToolkit/Services/Base.pm line 147.
Can't exec "runlevel": No such file or directory at /opt/perfsonar_ps/toolkit/scripts/../lib/perfSONAR_PS/NPToolkit/Services/Base.pm line 147.
root@lcgnetmon> tail -f owamp_bwctl.log
Dec 15 11:30:34 lcgnetmon owampd[18264]: FILE=owampd.c, LINE=724, Control session terminated abnormally...
Dec 15 11:30:37 lcgnetmon owampd[1661]: FILE=policy.c, LINE=1811, ResReq ALLOWED: regular:release:disk = 16 (result = 130628, limit = 1000000000)
Dec 15 11:30:37 lcgnetmon owampd[1661]: FILE=policy.c, LINE=1811, ResReq ALLOWED: regular:release:bandwidth = 3360 (result = 10080, limit = 10000000)
Dec 15 11:30:37 lcgnetmon owampd[18265]: FILE=owampd.c, LINE=724, Control session terminated abnormally...
Dec 15 11:30:40 lcgnetmon owampd[1661]: FILE=policy.c, LINE=1811, ResReq ALLOWED: regular:release:bandwidth = 3360 (result = 6720, limit = 10000000)
Dec 15 11:30:40 lcgnetmon owampd[18055]: FILE=owampd.c, LINE=724, Control session terminated abnormally...
Dec 15 11:30:40 lcgnetmon owampd[1661]: FILE=policy.c, LINE=1811, ResReq ALLOWED: regular:release:bandwidth = 3360 (result = 3360, limit = 10000000)
Dec 15 11:30:40 lcgnetmon owampd[18044]: FILE=owampd.c, LINE=724, Control session terminated abnormally...
Dec 15 11:31:40 lcgnetmon owampd[1661]: FILE=policy.c, LINE=1811, ResReq ALLOWED: regular:release:bandwidth = 3360 (result = 0, limit = 10000000)
Dec 15 11:31:40 lcgnetmon owampd[18057]: FILE=owampd.c, LINE=724, Control session terminated abnormally...
None look healthy!
Also, on another box/browser, trying to get to
http://lcgnetmon02.phy.bris.ac.uk/toolkit/ brings up
Secure Connection Failed
lcgnetmon02.phy.bris.ac.uk uses an invalid security certificate.
The certificate is not trusted because it is self signed.
The certificate is only valid for bfc.phy.bris.ac.uk
The certificate expired on 02/10/14 08:52.
(Error code: sec_error_expired_issuer_certificate)
Found the files all right :
root@lcgnetmon> ll /etc/pki/tls/private/localhost.key /etc/pki/tls/certs/localhost.crt
-rw-------. 1 root root 1204 Oct 2 2013 /etc/pki/tls/certs/localhost.crt
-rw-------. 1 root root 891 Oct 2 2013 /etc/pki/tls/private/localhost.key
root@lcgnetmon> openssl x509 -text -in /etc/pki/tls/certs/localhost.crt|grep -i not
Not Before: Oct 2 03:49:02 2013 GMT
Not After : Oct 2 03:49:02 2014 GMT
Confirmed, expired: - where is howto to regenerate them? Or are expired certs
not important?
VERY grateful for pointers & help!
Winnie Lacesso / Bristol University Particle Physics Computing Systems
HH Wills Physics Laboratory, Tyndall Avenue, Bristol, BS8 1TL, UK
- [perfsonar-user] Help to debug non-working perfsonar boxen, Winnie Lacesso, 12/15/2014
- Re: [perfsonar-user] Help to debug non-working perfsonar boxen, Jason Zurawski, 12/15/2014
- Re: [perfsonar-user] Help to debug non-working perfsonar boxen, Winnie Lacesso, 12/16/2014
- Re: [perfsonar-user] Help to debug non-working perfsonar boxen, Jason Zurawski, 12/17/2014
- Re: [perfsonar-user] Help to debug non-working perfsonar boxen, Winnie Lacesso, 12/17/2014
- Re: [perfsonar-user] Help to debug non-working perfsonar boxen, Winnie Lacesso, 12/19/2014
- Re: [perfsonar-user] Help to debug non-working perfsonar boxen, Szymon Trocha, 12/19/2014
- Re: [perfsonar-user] Help to debug non-working perfsonar boxen, Jason Zurawski, 12/17/2014
- Re: [perfsonar-user] Help to debug non-working perfsonar boxen, Winnie Lacesso, 12/16/2014
- Re: [perfsonar-user] Help to debug non-working perfsonar boxen, Shawn McKee, 12/15/2014
- Re: [perfsonar-user] Help to debug non-working perfsonar boxen, Jason Zurawski, 12/15/2014
Archive powered by MHonArc 2.6.16.