Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] Help to debug non-working perfsonar boxen

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] Help to debug non-working perfsonar boxen


Chronological Thread 
  • From: Jason Zurawski <>
  • To:
  • Cc:
  • Subject: Re: [perfsonar-user] Help to debug non-working perfsonar boxen
  • Date: Mon, 15 Dec 2014 11:12:52 -0500

Hi Winnie;

Let me start with an easy question first - do you know if these hosts are
running perfSONAR 3.4.x? I can’t seem to reach either from where I am
sitting, and I see no records of them in the lookup service.

If the answer is “no” or “I don’t know”, it sounds like a re-install may be
the most expedient way to get things into a happy state.

Next question - if the site is firewall’d, the nodes will need holes punched
for their functions. We have a list of ports here (see “Using perfSONAR with
Firewalls” section):

http://www.perfsonar.net/deploy/security-considerations/

Hope this helps to get things started, thanks;

-jason

On Dec 15, 2014, at 10:56 AM, Winnie Lacesso
<>
wrote:

> Good afternoon,
>
> I inherited 2 perfsonar boxen. They were working but I have been
> notified (ggus ticket 110365) they they have broken & am seeking
> troubleshooting advice / pointers.
>
> http://www.perfsonar.net/deploy/troubleshooting/
> "Section under construction." :(
> No help there.
>
> TBH broken = partly my bad,someone told me to run
> /opt/perfsonar_ps/mesh_config/bin/generate_configuration
> but they didn't say to run it as perfsonar not root!
> So am wondering if running that as root has badly damaged something that
> running it properly later on can't fix.
> In old email I stumbled across the right way:
>
> sudo -u perfsonar /opt/perfsonar_ps/mesh_config/bin/generate_configuration
> --verbose
>
> The tail end of that is ending with
>
> 2014/12/15 08:41:48 (3009) DEBUG> HTTPS.pm:65
> perfSONAR_PS::Utils::HTTPS::https_get - Connecting to: myosg.grid.iu.edu:
> 443
> 2014/12/15 08:42:51 (3009) DEBUG> HTTPS.pm:118
> perfSONAR_PS::Utils::HTTPS::https_get - Problem retrieving
> https://myosg.grid.iu.edu/pfmesh/mine/hostname/lcgnetmon.phy.bris.ac.uk:
> IO::Socket::INET6 configuration
> failederror:00000000:lib(0):func(0):reason(0)
> 2014/12/15 08:42:51 (3009) DEBUG> Utils.pm:229
> perfSONAR_PS::MeshConfig::Utils::__load_json - Problem retrieving mesh
> configuration from
> https://myosg.grid.iu.edu/pfmesh/mine/hostname/lcgnetmon.phy.bris.ac.uk:
> Problem retrieving
> https://myosg.grid.iu.edu/pfmesh/mine/hostname/lcgnetmon.phy.bris.ac.uk:
> IO::Socket::INET6 configuration
> failederror:00000000:lib(0):func(0):reason(0)
> 2014/12/15 08:42:51 (3009) ERROR> Agent.pm:292
> perfSONAR_PS::MeshConfig::Agent::__configure_host - Problem with mesh
> configuration: Problem retrieving
> https://myosg.grid.iu.edu/pfmesh/mine/hostname/lcgnetmon.phy.bris.ac.uk:
> IO::Socket::INET6 configuration
> failederror:00000000:lib(0):func(0):reason(0)
> 2014/12/15 08:42:51 (3009) ERROR> Agent.pm:405
> perfSONAR_PS::MeshConfig::Agent::__configure_host - Problem with required
> meshes, not changing configuration
> 2014/12/15 08:42:51 (3009) DEBUG> Agent.pm:137
> perfSONAR_PS::MeshConfig::Agent::__send_error_messages - No email address
> to send error message to: Problem with mesh configuration: Problem
> retrieving
> https://myosg.grid.iu.edu/pfmesh/mine/hostname/lcgnetmon.phy.bris.ac.uk:
> IO::Socket::INET6 configuration
> failederror:00000000:lib(0):func(0):reason(0)
> 2014/12/15 08:42:51 (3009) DEBUG> Agent.pm:137
> perfSONAR_PS::MeshConfig::Agent::__send_error_messages - No email address
> to send error message to: Problem with required meshes, not changing
> configuration
>
>
> The mesh config :
>
> root@lcgnetmon>
> grep -v \# /opt/perfsonar_ps/mesh_config/etc/agent_configuration.conf |
> uniq
>
> <mesh>
> configuration_url
> https://myosg.grid.iu.edu/pfmesh/mine/hostname/lcgnetmon.phy.bris.ac.uk
> validate_certificate 0
> required 1
> </mesh>
>
> restart_services 1
>
> use_toolkit 1
>
> send_error_emails 1
>
> address lcgnetmon.phy.bris.ac.uk
> admin_email
>
> skip_redundant_tests 1
>
>
> Is there something wrong with that?
>
> I know little about perfsonar & will read the docs but they seem to be
> i) install ii) config iii) it just works.
> Whereas these are not working boxen.
>
> Lots of errors in logfiles:
>
> root@lcgnetmon>
> cd /var/log/perfsonar; /bin/ls -lFt | head
> total 6772604
> -rw-r--r-- 1 perfsonar perfsonar 721791 Dec 15 15:47
> perfsonarbuoy_ma.log
> -rw-r--r-- 1 perfsonar perfsonar 1509606 Dec 15 15:47 regular_testing.log
> -rw-r--r-- 1 perfsonar perfsonar 16145917 Dec 15 15:47
> traceroute_ondemand_mp.log
> -rw-r--r-- 1 perfsonar perfsonar 1046006 Dec 15 15:47 pinger.log
> drwxr-xr-x. 2 apache perfsonar 4096 Dec 15 15:47 web_admin/
> -rw-r--r-- 1 root root 1573 Dec 15 15:00
> service_watcher_error.log
> -rw-r--r-- 1 perfsonar perfsonar 34223 Dec 15 11:31 owamp_bwctl.log
> -rw-r--r-- 1 perfsonar perfsonar 212 Dec 15 08:50 psb_to_esmond.log
> -rw-r--r-- 1 perfsonar perfsonar 14920093 Dec 15 08:49
> traceroute_scheduler.log
>
> root@lcgnetmon>
> tail -5 perfsonarbuoy_ma.log
> 2014/12/15 15:46:35 (1931) WARN> daemon.pl:425 main::__ANON__ - Warned:
> Exiting eval via next at
> /opt/perfsonar_ps/perfsonarbuoy_ma/bin/../lib/perfSONAR_PS/Utils/MARegistrationManager.pm
> line 103.
> 2014/12/15 15:47:41 (1931) WARN> MARegistrationManager.pm:102
> perfSONAR_PS::Utils::MARegistrationManager::register - Error trying to
> lookup administrator lcg-site-admin in LS: 500 Can't connect to
> sls.geant.net:8090 (connect: timeout)
> 2014/12/15 15:47:41 (1931) WARN> daemon.pl:425 main::__ANON__ - Warned:
> Exiting subroutine via next at
> /opt/perfsonar_ps/perfsonarbuoy_ma/bin/../lib/perfSONAR_PS/Utils/MARegistrationManager.pm
> line 103.
> 2014/12/15 15:47:41 (1931) WARN> daemon.pl:425 main::__ANON__ - Warned:
> Exiting subroutine via next at
> /opt/perfsonar_ps/perfsonarbuoy_ma/bin/../lib/perfSONAR_PS/Utils/MARegistrationManager.pm
> line 103.
> 2014/12/15 15:47:41 (1931) WARN> daemon.pl:425 main::__ANON__ - Warned:
> Exiting eval via next at
> /opt/perfsonar_ps/perfsonarbuoy_ma/bin/../lib/perfSONAR_PS/Utils/MARegistrationManager.pm
> line 103.
>
>
> root@lcgnetmon>
> tail -5 regular_testing.log
> 2014/12/15 15:47:08 (15218) INFO> EsmondBase.pm:56
> perfSONAR_PS::RegularTesting::MeasurementArchives::EsmondBase::__ANON__ -
> Metadata URI: /esmond/perfsonar/archive/7237ac2ac5dc41c7b9711248d1806310/
> 2014/12/15 15:47:16 (1969) WARN> daemon:103 main::__ANON__ - Warned:
> IPC::DirQueue: killed stale lockfile:
> /var/lib/perfsonar/regular_testing/esmond_latency_localhost/active/active/50.20141128152342283489.EMjU1Nw
> at /usr/share/perl5/IPC/DirQueue.pm line 519.
> 2014/12/15 15:47:29 (1969) WARN> daemon:103 main::__ANON__ - Warned:
> IPC::DirQueue: killed stale lockfile:
> /var/lib/perfsonar/regular_testing/esmond_latency_localhost/active/active/50.20141128152342416486.EMjU1Ng
> at /usr/share/perl5/IPC/DirQueue.pm line 519.
> 2014/12/15 15:47:40 (1969) WARN> daemon:103 main::__ANON__ - Warned:
> IPC::DirQueue: killed stale lockfile:
> /var/lib/perfsonar/regular_testing/esmond_latency_localhost/active/active/50.20141128152342450834.EMjU1N
> at /usr/share/perl5/IPC/DirQueue.pm line 519.
> 2014/12/15 15:47:52 (1969) WARN> daemon:103 main::__ANON__ - Warned:
> IPC::DirQueue: killed stale lockfile:
> /var/lib/perfsonar/regular_testing/esmond_latency_localhost/active/active/50.20141128152343286581.EMjU1Nw
> at /usr/share/perl5/IPC/DirQueue.pm line 519.
>
> root@lcgnetmon>
> tail -5 traceroute_ondemand_mp.log
> 2014/12/15 15:46:18 (1685) DEBUG> daemon.pl:569 main::psService - Accept
> returned nothing, likely a timeout occurred or a child exited
> 2014/12/15 15:46:38 (1685) DEBUG> daemon.pl:569 main::psService - Accept
> returned nothing, likely a timeout occurred or a child exited
> 2014/12/15 15:46:58 (1685) DEBUG> daemon.pl:569 main::psService - Accept
> returned nothing, likely a timeout occurred or a child exited
> 2014/12/15 15:47:18 (1685) DEBUG> daemon.pl:569 main::psService - Accept
> returned nothing, likely a timeout occurred or a child exited
>
> root@lcgnetmon>
> tail -5 pinger.log
> 2014/12/15 15:47:04 (1959) ERROR> Remote.pm:479
> perfSONAR_PS::Client::LS::Remote::registerStatic - LS cannot be reached,
> supply alternate or consult gLS.
> 2014/12/15 15:47:26 (1960) INFO> PingER.pm:300
> perfSONAR_PS::Services::MA::PingER::registerLS - Registering PingER MA with
> LS
> 2014/12/15 15:47:26 (1960) INFO> PingER.pm:300
> perfSONAR_PS::Services::MA::PingER::registerLS - Registering PingER MA with
> LS
> 2014/12/15 15:47:27 (1960) ERROR> Remote.pm:352
> perfSONAR_PS::Client::LS::Remote::getLS - LS List is emtpty, cannot contact
> active LS for registration.
> 2014/12/15 15:47:27 (1960) ERROR> Remote.pm:479
> perfSONAR_PS::Client::LS::Remote::registerStatic - LS cannot be reached,
> supply alternate or consult gLS.
>
> root@lcgnetmon>
> tail -5 service_watcher_error.log
> Can't exec "runlevel": No such file or directory at
> /opt/perfsonar_ps/toolkit/scripts/../lib/perfSONAR_PS/NPToolkit/Services/Base.pm
> line 147.
> Can't exec "runlevel": No such file or directory at
> /opt/perfsonar_ps/toolkit/scripts/../lib/perfSONAR_PS/NPToolkit/Services/Base.pm
> line 147.
> Can't exec "runlevel": No such file or directory at
> /opt/perfsonar_ps/toolkit/scripts/../lib/perfSONAR_PS/NPToolkit/Services/Base.pm
> line 147.
> Can't exec "runlevel": No such file or directory at
> /opt/perfsonar_ps/toolkit/scripts/../lib/perfSONAR_PS/NPToolkit/Services/Base.pm
> line 147.
> Can't exec "runlevel": No such file or directory at
> /opt/perfsonar_ps/toolkit/scripts/../lib/perfSONAR_PS/NPToolkit/Services/Base.pm
> line 147.
>
> root@lcgnetmon>
> tail -f owamp_bwctl.log
> Dec 15 11:30:34 lcgnetmon owampd[18264]: FILE=owampd.c, LINE=724, Control
> session terminated abnormally...
> Dec 15 11:30:37 lcgnetmon owampd[1661]: FILE=policy.c, LINE=1811, ResReq
> ALLOWED: regular:release:disk = 16 (result = 130628, limit = 1000000000)
> Dec 15 11:30:37 lcgnetmon owampd[1661]: FILE=policy.c, LINE=1811, ResReq
> ALLOWED: regular:release:bandwidth = 3360 (result = 10080, limit = 10000000)
> Dec 15 11:30:37 lcgnetmon owampd[18265]: FILE=owampd.c, LINE=724, Control
> session terminated abnormally...
> Dec 15 11:30:40 lcgnetmon owampd[1661]: FILE=policy.c, LINE=1811, ResReq
> ALLOWED: regular:release:bandwidth = 3360 (result = 6720, limit = 10000000)
> Dec 15 11:30:40 lcgnetmon owampd[18055]: FILE=owampd.c, LINE=724, Control
> session terminated abnormally...
> Dec 15 11:30:40 lcgnetmon owampd[1661]: FILE=policy.c, LINE=1811, ResReq
> ALLOWED: regular:release:bandwidth = 3360 (result = 3360, limit = 10000000)
> Dec 15 11:30:40 lcgnetmon owampd[18044]: FILE=owampd.c, LINE=724, Control
> session terminated abnormally...
> Dec 15 11:31:40 lcgnetmon owampd[1661]: FILE=policy.c, LINE=1811, ResReq
> ALLOWED: regular:release:bandwidth = 3360 (result = 0, limit = 10000000)
> Dec 15 11:31:40 lcgnetmon owampd[18057]: FILE=owampd.c, LINE=724, Control
> session terminated abnormally...
>
> None look healthy!
>
> Also, on another box/browser, trying to get to
> http://lcgnetmon02.phy.bris.ac.uk/toolkit/ brings up
>
> Secure Connection Failed
> lcgnetmon02.phy.bris.ac.uk uses an invalid security certificate.
> The certificate is not trusted because it is self signed.
> The certificate is only valid for bfc.phy.bris.ac.uk
> The certificate expired on 02/10/14 08:52.
> (Error code: sec_error_expired_issuer_certificate)
>
> Found the files all right :
> root@lcgnetmon>
> ll /etc/pki/tls/private/localhost.key /etc/pki/tls/certs/localhost.crt
> -rw-------. 1 root root 1204 Oct 2 2013 /etc/pki/tls/certs/localhost.crt
> -rw-------. 1 root root 891 Oct 2 2013 /etc/pki/tls/private/localhost.key
>
> root@lcgnetmon>
> openssl x509 -text -in /etc/pki/tls/certs/localhost.crt|grep -i not
> Not Before: Oct 2 03:49:02 2013 GMT
> Not After : Oct 2 03:49:02 2014 GMT
>
> Confirmed, expired: - where is howto to regenerate them? Or are expired
> certs
> not important?
>
> VERY grateful for pointers & help!
>
> Winnie Lacesso / Bristol University Particle Physics Computing Systems
> HH Wills Physics Laboratory, Tyndall Avenue, Bristol, BS8 1TL, UK



Archive powered by MHonArc 2.6.16.

Top of Page