Skip to Content.
Sympa Menu

perfsonar-user - [perfsonar-user] Help to debug non-working perfsonar boxen

Subject: perfSONAR User Q&A and Other Discussion

List archive

[perfsonar-user] Help to debug non-working perfsonar boxen


Chronological Thread 
  • From: Winnie Lacesso <>
  • To:
  • Subject: [perfsonar-user] Help to debug non-working perfsonar boxen
  • Date: Mon, 15 Dec 2014 15:56:58 +0000 (GMT)

Good afternoon,

I inherited 2 perfsonar boxen. They were working but I have been
notified (ggus ticket 110365) they they have broken & am seeking
troubleshooting advice / pointers.

http://www.perfsonar.net/deploy/troubleshooting/
"Section under construction." :(
No help there.

TBH broken = partly my bad,someone told me to run
/opt/perfsonar_ps/mesh_config/bin/generate_configuration
but they didn't say to run it as perfsonar not root!
So am wondering if running that as root has badly damaged something that
running it properly later on can't fix.
In old email I stumbled across the right way:

sudo -u perfsonar /opt/perfsonar_ps/mesh_config/bin/generate_configuration
--verbose

The tail end of that is ending with

2014/12/15 08:41:48 (3009) DEBUG> HTTPS.pm:65
perfSONAR_PS::Utils::HTTPS::https_get - Connecting to: myosg.grid.iu.edu: 443
2014/12/15 08:42:51 (3009) DEBUG> HTTPS.pm:118
perfSONAR_PS::Utils::HTTPS::https_get - Problem retrieving
https://myosg.grid.iu.edu/pfmesh/mine/hostname/lcgnetmon.phy.bris.ac.uk:
IO::Socket::INET6 configuration failederror:00000000:lib(0):func(0):reason(0)
2014/12/15 08:42:51 (3009) DEBUG> Utils.pm:229
perfSONAR_PS::MeshConfig::Utils::__load_json - Problem retrieving mesh
configuration from
https://myosg.grid.iu.edu/pfmesh/mine/hostname/lcgnetmon.phy.bris.ac.uk:
Problem retrieving
https://myosg.grid.iu.edu/pfmesh/mine/hostname/lcgnetmon.phy.bris.ac.uk:
IO::Socket::INET6 configuration failederror:00000000:lib(0):func(0):reason(0)
2014/12/15 08:42:51 (3009) ERROR> Agent.pm:292
perfSONAR_PS::MeshConfig::Agent::__configure_host - Problem with mesh
configuration: Problem retrieving
https://myosg.grid.iu.edu/pfmesh/mine/hostname/lcgnetmon.phy.bris.ac.uk:
IO::Socket::INET6 configuration failederror:00000000:lib(0):func(0):reason(0)
2014/12/15 08:42:51 (3009) ERROR> Agent.pm:405
perfSONAR_PS::MeshConfig::Agent::__configure_host - Problem with required
meshes, not changing configuration
2014/12/15 08:42:51 (3009) DEBUG> Agent.pm:137
perfSONAR_PS::MeshConfig::Agent::__send_error_messages - No email address to
send error message to: Problem with mesh configuration: Problem retrieving
https://myosg.grid.iu.edu/pfmesh/mine/hostname/lcgnetmon.phy.bris.ac.uk:
IO::Socket::INET6 configuration failederror:00000000:lib(0):func(0):reason(0)
2014/12/15 08:42:51 (3009) DEBUG> Agent.pm:137
perfSONAR_PS::MeshConfig::Agent::__send_error_messages - No email address to
send error message to: Problem with required meshes, not changing
configuration


The mesh config :

root@lcgnetmon>
grep -v \# /opt/perfsonar_ps/mesh_config/etc/agent_configuration.conf | uniq

<mesh>
configuration_url
https://myosg.grid.iu.edu/pfmesh/mine/hostname/lcgnetmon.phy.bris.ac.uk
validate_certificate 0
required 1
</mesh>

restart_services 1

use_toolkit 1

send_error_emails 1

address lcgnetmon.phy.bris.ac.uk
admin_email

skip_redundant_tests 1


Is there something wrong with that?

I know little about perfsonar & will read the docs but they seem to be
i) install ii) config iii) it just works.
Whereas these are not working boxen.

Lots of errors in logfiles:

root@lcgnetmon>
cd /var/log/perfsonar; /bin/ls -lFt | head
total 6772604
-rw-r--r-- 1 perfsonar perfsonar 721791 Dec 15 15:47 perfsonarbuoy_ma.log
-rw-r--r-- 1 perfsonar perfsonar 1509606 Dec 15 15:47 regular_testing.log
-rw-r--r-- 1 perfsonar perfsonar 16145917 Dec 15 15:47
traceroute_ondemand_mp.log
-rw-r--r-- 1 perfsonar perfsonar 1046006 Dec 15 15:47 pinger.log
drwxr-xr-x. 2 apache perfsonar 4096 Dec 15 15:47 web_admin/
-rw-r--r-- 1 root root 1573 Dec 15 15:00
service_watcher_error.log
-rw-r--r-- 1 perfsonar perfsonar 34223 Dec 15 11:31 owamp_bwctl.log
-rw-r--r-- 1 perfsonar perfsonar 212 Dec 15 08:50 psb_to_esmond.log
-rw-r--r-- 1 perfsonar perfsonar 14920093 Dec 15 08:49
traceroute_scheduler.log

root@lcgnetmon>
tail -5 perfsonarbuoy_ma.log
2014/12/15 15:46:35 (1931) WARN> daemon.pl:425 main::__ANON__ - Warned:
Exiting eval via next at
/opt/perfsonar_ps/perfsonarbuoy_ma/bin/../lib/perfSONAR_PS/Utils/MARegistrationManager.pm
line 103.
2014/12/15 15:47:41 (1931) WARN> MARegistrationManager.pm:102
perfSONAR_PS::Utils::MARegistrationManager::register - Error trying to lookup
administrator lcg-site-admin in LS: 500 Can't connect to sls.geant.net:8090
(connect: timeout)
2014/12/15 15:47:41 (1931) WARN> daemon.pl:425 main::__ANON__ - Warned:
Exiting subroutine via next at
/opt/perfsonar_ps/perfsonarbuoy_ma/bin/../lib/perfSONAR_PS/Utils/MARegistrationManager.pm
line 103.
2014/12/15 15:47:41 (1931) WARN> daemon.pl:425 main::__ANON__ - Warned:
Exiting subroutine via next at
/opt/perfsonar_ps/perfsonarbuoy_ma/bin/../lib/perfSONAR_PS/Utils/MARegistrationManager.pm
line 103.
2014/12/15 15:47:41 (1931) WARN> daemon.pl:425 main::__ANON__ - Warned:
Exiting eval via next at
/opt/perfsonar_ps/perfsonarbuoy_ma/bin/../lib/perfSONAR_PS/Utils/MARegistrationManager.pm
line 103.


root@lcgnetmon>
tail -5 regular_testing.log
2014/12/15 15:47:08 (15218) INFO> EsmondBase.pm:56
perfSONAR_PS::RegularTesting::MeasurementArchives::EsmondBase::__ANON__ -
Metadata URI: /esmond/perfsonar/archive/7237ac2ac5dc41c7b9711248d1806310/
2014/12/15 15:47:16 (1969) WARN> daemon:103 main::__ANON__ - Warned:
IPC::DirQueue: killed stale lockfile:
/var/lib/perfsonar/regular_testing/esmond_latency_localhost/active/active/50.20141128152342283489.EMjU1Nw
at /usr/share/perl5/IPC/DirQueue.pm line 519.
2014/12/15 15:47:29 (1969) WARN> daemon:103 main::__ANON__ - Warned:
IPC::DirQueue: killed stale lockfile:
/var/lib/perfsonar/regular_testing/esmond_latency_localhost/active/active/50.20141128152342416486.EMjU1Ng
at /usr/share/perl5/IPC/DirQueue.pm line 519.
2014/12/15 15:47:40 (1969) WARN> daemon:103 main::__ANON__ - Warned:
IPC::DirQueue: killed stale lockfile:
/var/lib/perfsonar/regular_testing/esmond_latency_localhost/active/active/50.20141128152342450834.EMjU1N
at /usr/share/perl5/IPC/DirQueue.pm line 519.
2014/12/15 15:47:52 (1969) WARN> daemon:103 main::__ANON__ - Warned:
IPC::DirQueue: killed stale lockfile:
/var/lib/perfsonar/regular_testing/esmond_latency_localhost/active/active/50.20141128152343286581.EMjU1Nw
at /usr/share/perl5/IPC/DirQueue.pm line 519.

root@lcgnetmon>
tail -5 traceroute_ondemand_mp.log
2014/12/15 15:46:18 (1685) DEBUG> daemon.pl:569 main::psService - Accept
returned nothing, likely a timeout occurred or a child exited
2014/12/15 15:46:38 (1685) DEBUG> daemon.pl:569 main::psService - Accept
returned nothing, likely a timeout occurred or a child exited
2014/12/15 15:46:58 (1685) DEBUG> daemon.pl:569 main::psService - Accept
returned nothing, likely a timeout occurred or a child exited
2014/12/15 15:47:18 (1685) DEBUG> daemon.pl:569 main::psService - Accept
returned nothing, likely a timeout occurred or a child exited

root@lcgnetmon>
tail -5 pinger.log
2014/12/15 15:47:04 (1959) ERROR> Remote.pm:479
perfSONAR_PS::Client::LS::Remote::registerStatic - LS cannot be reached,
supply alternate or consult gLS.
2014/12/15 15:47:26 (1960) INFO> PingER.pm:300
perfSONAR_PS::Services::MA::PingER::registerLS - Registering PingER MA with LS
2014/12/15 15:47:26 (1960) INFO> PingER.pm:300
perfSONAR_PS::Services::MA::PingER::registerLS - Registering PingER MA with LS
2014/12/15 15:47:27 (1960) ERROR> Remote.pm:352
perfSONAR_PS::Client::LS::Remote::getLS - LS List is emtpty, cannot contact
active LS for registration.
2014/12/15 15:47:27 (1960) ERROR> Remote.pm:479
perfSONAR_PS::Client::LS::Remote::registerStatic - LS cannot be reached,
supply alternate or consult gLS.

root@lcgnetmon>
tail -5 service_watcher_error.log
Can't exec "runlevel": No such file or directory at
/opt/perfsonar_ps/toolkit/scripts/../lib/perfSONAR_PS/NPToolkit/Services/Base.pm
line 147.
Can't exec "runlevel": No such file or directory at
/opt/perfsonar_ps/toolkit/scripts/../lib/perfSONAR_PS/NPToolkit/Services/Base.pm
line 147.
Can't exec "runlevel": No such file or directory at
/opt/perfsonar_ps/toolkit/scripts/../lib/perfSONAR_PS/NPToolkit/Services/Base.pm
line 147.
Can't exec "runlevel": No such file or directory at
/opt/perfsonar_ps/toolkit/scripts/../lib/perfSONAR_PS/NPToolkit/Services/Base.pm
line 147.
Can't exec "runlevel": No such file or directory at
/opt/perfsonar_ps/toolkit/scripts/../lib/perfSONAR_PS/NPToolkit/Services/Base.pm
line 147.

root@lcgnetmon>
tail -f owamp_bwctl.log
Dec 15 11:30:34 lcgnetmon owampd[18264]: FILE=owampd.c, LINE=724, Control
session terminated abnormally...
Dec 15 11:30:37 lcgnetmon owampd[1661]: FILE=policy.c, LINE=1811, ResReq
ALLOWED: regular:release:disk = 16 (result = 130628, limit = 1000000000)
Dec 15 11:30:37 lcgnetmon owampd[1661]: FILE=policy.c, LINE=1811, ResReq
ALLOWED: regular:release:bandwidth = 3360 (result = 10080, limit = 10000000)
Dec 15 11:30:37 lcgnetmon owampd[18265]: FILE=owampd.c, LINE=724, Control
session terminated abnormally...
Dec 15 11:30:40 lcgnetmon owampd[1661]: FILE=policy.c, LINE=1811, ResReq
ALLOWED: regular:release:bandwidth = 3360 (result = 6720, limit = 10000000)
Dec 15 11:30:40 lcgnetmon owampd[18055]: FILE=owampd.c, LINE=724, Control
session terminated abnormally...
Dec 15 11:30:40 lcgnetmon owampd[1661]: FILE=policy.c, LINE=1811, ResReq
ALLOWED: regular:release:bandwidth = 3360 (result = 3360, limit = 10000000)
Dec 15 11:30:40 lcgnetmon owampd[18044]: FILE=owampd.c, LINE=724, Control
session terminated abnormally...
Dec 15 11:31:40 lcgnetmon owampd[1661]: FILE=policy.c, LINE=1811, ResReq
ALLOWED: regular:release:bandwidth = 3360 (result = 0, limit = 10000000)
Dec 15 11:31:40 lcgnetmon owampd[18057]: FILE=owampd.c, LINE=724, Control
session terminated abnormally...

None look healthy!

Also, on another box/browser, trying to get to
http://lcgnetmon02.phy.bris.ac.uk/toolkit/ brings up

Secure Connection Failed
lcgnetmon02.phy.bris.ac.uk uses an invalid security certificate.
The certificate is not trusted because it is self signed.
The certificate is only valid for bfc.phy.bris.ac.uk
The certificate expired on 02/10/14 08:52.
(Error code: sec_error_expired_issuer_certificate)

Found the files all right :
root@lcgnetmon>
ll /etc/pki/tls/private/localhost.key /etc/pki/tls/certs/localhost.crt
-rw-------. 1 root root 1204 Oct 2 2013 /etc/pki/tls/certs/localhost.crt
-rw-------. 1 root root 891 Oct 2 2013 /etc/pki/tls/private/localhost.key

root@lcgnetmon>
openssl x509 -text -in /etc/pki/tls/certs/localhost.crt|grep -i not
Not Before: Oct 2 03:49:02 2013 GMT
Not After : Oct 2 03:49:02 2014 GMT

Confirmed, expired: - where is howto to regenerate them? Or are expired certs
not important?

VERY grateful for pointers & help!

Winnie Lacesso / Bristol University Particle Physics Computing Systems
HH Wills Physics Laboratory, Tyndall Avenue, Bristol, BS8 1TL, UK



Archive powered by MHonArc 2.6.16.

Top of Page