Skip to Content.
Sympa Menu

perfsonar-user - RE: [perfsonar-user] regular_testing service errors

Subject: perfSONAR User Q&A and Other Discussion

List archive

RE: [perfsonar-user] regular_testing service errors


Chronological Thread 
  • From: "Garnizov, Ivan (RRZE)" <>
  • To: "Garnizov, Ivan (RRZE)" <>
  • Cc: "" <>
  • Subject: RE: [perfsonar-user] regular_testing service errors
  • Date: Tue, 14 Jul 2015 14:13:15 +0000
  • Accept-language: en-GB, de-DE, en-US

Hi Shawn, Andy, Dan,

 

The limitation of 2GB RAM is gone on the test instance with 3.4.2 and yet the issue with regulat testing remains.

 

[root@test-rhps02 ~]# ps auxw | grep owampd

owamp     2408  0.0  0.0   7272   688 ?        Ss   09:24   0:05 /usr/bin/owampd -c /etc/owampd -R /var/run

owamp    61045  0.0  0.0   7484   776 ?        S    13:55   0:00 /usr/bin/owampd -c /etc/owampd -R /var/run

owamp    61059  0.0  0.0   7484   768 ?        S    13:55   0:00 /usr/bin/owampd -c /etc/owampd -R /var/run

owamp    61164  0.0  0.0   7484   764 ?        S    13:56   0:00 /usr/bin/owampd -c /etc/owampd -R /var/run

owamp    61165  0.0  0.0   7484   484 ?        S    13:56   0:00 /usr/bin/owampd -c /etc/owampd -R /var/run

owamp    61187  0.0  0.0   7484   760 ?        S    13:56   0:00 /usr/bin/owampd -c /etc/owampd -R /var/run

owamp    61188  0.0  0.0   7484   408 ?        S    13:56   0:00 /usr/bin/owampd -c /etc/owampd -R /var/run

owamp    61310  0.0  0.0   7484   764 ?        S    13:57   0:00 /usr/bin/owampd -c /etc/owampd -R /var/run

owamp    61314  0.0  0.0   7484   468 ?        S    13:57   0:00 /usr/bin/owampd -c /etc/owampd -R /var/run

owamp    61328  0.0  0.0   7484   760 ?        S    13:57   0:00 /usr/bin/owampd -c /etc/owampd -R /var/run

owamp    61329  0.0  0.0   7484   400 ?        S    13:57   0:00 /usr/bin/owampd -c /etc/owampd -R /var/run

owamp    61335  0.0  0.0   7500   796 ?        S    13:57   0:00 /usr/bin/owampd -c /etc/owampd -R /var/run

owamp    61386  0.0  0.0   7620   884 ?        S    13:58   0:00 /usr/bin/owampd -c /etc/owampd -R /var/run

owamp    61390  0.0  0.0   7620   580 ?        S    13:58   0:00 /usr/bin/owampd -c /etc/owampd -R /var/run

owamp    61421  0.0  0.0   7620   884 ?        S    13:58   0:00 /usr/bin/owampd -c /etc/owampd -R /var/run

owamp    61423  0.0  0.0   7620   580 ?        S    13:58   0:00 /usr/bin/owampd -c /etc/owampd -R /var/run

root     61425  0.0  0.0 103256   944 pts/0    S+   13:58   0:00 grep owampd

 

I also noticed that the same remark about scheduled owamp tests, Andy did to Casey, applies to my case. I can only ask that the  sample_count value is increased for the mesh for perfSONAR Software Testing in Indiana University by Dan.

 

Best regards,

Ivan

 

 

 

 

From: [mailto:] On Behalf Of Andrew Lake
Sent: Montag, 13. Juli 2015 19:26
To: Casey Russell
Cc:
Subject: Re: [perfsonar-user] Lost all Owamp testing on Thursday at 1:00am

 

Hi,

 

It looks like you have owamp configured to send 100 packets per second and register results every 300 packets (3 seconds). I believe OWAMP won’t let you actually do such a short reporting interval and will bump it up to something like 15 seconds. Unfortunately the regular_testing doesn’t know it did this, so when it doesn’t get results for 3x the specified reporting interval (9 seconds) it assumes it timed-out and restarts the process.

 

I would recommend increasing the packet count from 300 to something like 6000 (every 60 seconds). That’s generally the time interval we use for reporting owamp summaries. Let me know if you have any questions. 

 

Thanks,

Andy

 

 

 

 

From: [mailto:] On Behalf Of Garnizov, Ivan (RRZE)
Sent: Montag, 13. Juli 2015 16:51
To: Shawn McKee
Cc:
Subject: RE: [perfsonar-user] regular_testing service errors

 

Hi Shawn,

 

Thanks for pointing out the problem with ntp. In fact that immediately made me realize that there are firewall changes coming from the upgrade as well.

Out of curiosity why the ntp.conf is being updated during the upgrade process and all of my ntp servers are replaced?

At this very moment Puppet restored the ntp config and the system is synced.

 

About the 2G of RAM, you are right, but since this is a testing instance it participates in a mesh with less than 10 other servers.

I would guess even with such low parameters at least some tests would succeed. Anyway I will ask for an upgrade.

 

Best regards,

Ivan

 

From: Shawn McKee []
Sent: Montag, 13. Juli 2015 16:15
To: Garnizov, Ivan (RRZE)
Cc:
Subject: Re: [perfsonar-user] regular_testing service errors

 

Hi Ivan,


I noticed 2 things about your host:

 

1) It only has 2GB of RAM but for v3.4 the minimum recommended is 4GB.  This can cause problems, especially if you have more than a few tests ongoing.

2) Your host is not NTP synced right now.  

 

 

Shawn

 

On Mon, Jul 13, 2015 at 10:08 AM, Garnizov, Ivan (RRZE) <> wrote:

Hi guys, Andy,

 

The problem persists.

I have made more diagnostic tests.

First of all I decided to see if something prevents the powstream to operate, so I sniffed traffic and I believe the communication between the test hosts is good.

Then I decided to revert the changes, so….we restored from a snapshot. This time I have disabled our Puppet, just to make sure the system is untouched after the upgrade.

I have also let the old version play for a while. It worked fine and measurements are even recorded.

This time I also captured the process of the upgrade.

 

Now with Puppet disabled, with /var/lib /perfsonar/regular_testing/* cleaned up…..the problem persists.

I am able to run tests on the command line, but the error bellow continuously reappears to the log, while the service obviously is able to create and manage the folders stated:

 

2015/07/13 14:36:20 (62500) ERROR> CmdRunner.pm:148 perfSONAR_PS::RegularTesting::Utils::CmdRunner::run - Command exited, will restart in 278 seconds : /usr/bin/powstream -4 -p -d /var/lib/perfsonar/regular_testing/owamp_BBjps -c 300 -i 0.01 -S psmp-tst-02.dub.ie.geant.net -t ps-test.ctc.grnoc.iu.edu

2015/07/13 14:36:20 (62500) ERROR> CmdRunner.pm:148 perfSONAR_PS::RegularTesting::Utils::CmdRunner::run - Command exited, will restart in 278 seconds : /usr/bin/powstream -4 -p -d /var/lib/perfsonar/regular_testing/owamp_Te8FR -c 300 -i 0.01 -S psmp-tst-02.dub.ie.geant.net -t perfsonar-dev5.grnoc.iu.edu

2015/07/13 14:36:20 (62500) ERROR> CmdRunner.pm:148 perfSONAR_PS::RegularTesting::Utils::CmdRunner::run - Command exited, will restart in 278 seconds : /usr/bin/powstream -4 -p -d /var/lib/perfsonar/regular_testing/owamp_2JYwP -c 300 -i 0.01 -S psmp-tst-02.dub.ie.geant.net -t ps-deb.es.net

2015/07/13 14:36:20 (62500) ERROR> CmdRunner.pm:148 perfSONAR_PS::RegularTesting::Utils::CmdRunner::run - Command exited, will restart in 278 seconds : /usr/bin/powstream -4 -p -d /var/lib/perfsonar/regular_testing/owamp_R4S6A -c 300 -i 0.01 -S psmp-tst-02.dub.ie.geant.net -t antg-dev.es.net

2015/07/13 14:36:20 (62500) ERROR> CmdRunner.pm:148 perfSONAR_PS::RegularTesting::Utils::CmdRunner::run - Command exited, will restart in 278 seconds : /usr/bin/powstream -4 -p -d /var/lib/perfsonar/regular_testing/owamp_KmjKF -c 300 -i 0.01 -S psmp-tst-02.dub.ie.geant.net -t perfsonardev0.internet2.edu

2015/07/13 14:36:20 (62500) ERROR> CmdRunner.pm:148 perfSONAR_PS::RegularTesting::Utils::CmdRunner::run - Command exited, will restart in 278 seconds : /usr/bin/powstream -4 -p -d /var/lib/perfsonar/regular_testing/owamp_v8K1j -c 300 -i 0.01 -S psmp-tst-02.dub.ie.geant.net -t ma-dev2.bldc.grnoc.iu.edu

 

 

There are some warnings from the upgrade process:

 

warning: /etc/cassandra/default.conf/cassandra-env.sh created as /etc/cassandra/default.conf/cassandra-env.sh.rpmnew

warning: /opt/esmond/esmond.conf created as /opt/esmond/esmond.conf.rpmnew

warning: /opt/esmond/esmond/settings.py created as /opt/esmond/esmond/settings.py.rpmnew

New python executable in ./bin/python

Installing Setuptools.............................................................................................done.

Installing Pip....................................................................................................................................done.

Creating tables ...

Creating table ps_networkelement_subject

Creating table useripaddress

Installing custom SQL ...

Installing indexes ...

Installed 0 object(s) from 0 fixture(s)

 

 

Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/add_dbxml_path upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS

Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/add_sbin_path upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS

Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/add_toolkit_dirs_path upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS

Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/bwctl_port_verify upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS

Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/configure_bwctld_log_location upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS

Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/configure_esmond upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS

New python executable in ./bin/python

Installing Setuptools.............................................................................................done.

Installing Pip....................................................................................................................................done.

Creating tables ...

Installing custom SQL ...

Installing indexes ...

Installed 0 object(s) from 0 fixture(s)

User perfsonar exists

Setting timeseries permissions.

User perfsonar already has api key, skipping creation

Key: ------------------------------------- for perfsonar

Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/configure_fail2ban upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS

Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/configure_firewall upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS

Adding perfSONAR firewall rules

iptables: Saving firewall rules to /etc/sysconfig/iptables: [  OK  ]

ip6tables: Saving firewall rules to /etc/sysconfig/ip6tables: [  OK  ]

Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/configure_ntpd upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS

Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/configure_regular_testing upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS

New python executable in ./bin/python

Installing Setuptools.............................................................................................done.

Installing Pip....................................................................................................................................done.

Creating tables ...

Installing custom SQL ...

Installing indexes ...

Installed 0 object(s) from 0 fixture(s)

User perfsonar exists

Setting timeseries permissions.

User perfsonar already has api key, skipping creation

Key: ----------------------------------------------- for perfsonar

No tests to upgrade

Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/configure_sysctl upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS

Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/configure_syslog_local5_location upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS

Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/disable_http_trace upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS

Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/disable_mysql_network_access upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS

Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/disable_php_advertising upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS

Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/disable_unwanted_services upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS

Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/disable_weak_ssl_ciphers upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS

Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/disable_zeroconf upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS

Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/enable_apache_redirect upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS

Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/enable_auto_updates upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS

Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/enable_mysqld upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS

Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/enable_nscd upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS

Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/enable_ntpd upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS

Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/enable_web100_kernel_repository upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS

Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/enable_wheel_sudo upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS

Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/increase_owamp_limits upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS

Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/increase_owamp_port_range upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS

Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/upgrade_apache upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS

 

 

Best regards,

Ivan

 

From: Shawn McKee [mailto:]
Sent: Freitag, 10. Juli 2015 16:31
To: Garnizov, Ivan (RRZE)
Cc:
Subject: Re: [perfsonar-user] perfSonar - Internal Server Error

 

Hi Ivan,


There was a problem we have seen in 3.4.1 where old test results weren't cleaned up.

 

What does this command show?

 

du -hs /var/lib/perfsonar/regular_testing

 

(How much is there?)

 

If there is more than about 15 MB you may want to clean it up and reboot:

 

rm -rf /var/lib/perfsonar/regular_testing/*

reboot

 

Shawn

 

On Fri, Jul 10, 2015 at 10:25 AM, Garnizov, Ivan (RRZE) <> wrote:

Dear perfSONAR developers,

 

I have a strange case where a system that had been upgraded from 3.4.1 to 3.4.2 started experiencing errors on scheduled tests (regular_testing).

 

I have tried restarting the service, stopping /starting the service with killing all the powstream and bwctl processes, finally I restarted the server…but the problem persists.

 

ERROR> CmdRunner.pm:148 perfSONAR_PS::RegularTesting::Utils::CmdRunner::run - Command exited, will restart in 278 seconds : /usr/bin/powstream -4 -p -d /var/lib/perfsonar/regular_testing/owamp_sKeW0 -c 300 -i 0.01 -S psmp-tst-02.dub.ie.geant.net -t ma-dev2.bldc.grnoc.iu.edu

 

I am applying logs, current state of folder permissions and regular_testing.conf

 

There are logs from previous state when it was OK and the state after the upgrade.

 

Manual tests after the upgrade are successful:

[dfn.garnizov@test-rhps02 ~]$ owping -c 300 -i 0.01 -S psmp-tst-02.dub.ie.geant.net -t ps-test.ctc.grnoc.iu.edu

Approximately 6.8 seconds until results available

 

--- owping statistics from [psmp-tst-02.dub.ie.geant.net]:8847 to [ps-test.ctc.grnoc.iu.edu]:9334 ---

SID:    8cb62c5cd94a3e09053202dc3daf03e1

first:  2015-07-10T12:50:18.423

last:   2015-07-10T12:50:21.388

300 sent, 0 lost (0.000%), 0 duplicates

one-way delay min/median/max = 51.6/51.9/52.2 ms, (err=11.9 ms)

one-way jitter = 0.2 ms (P95-P50)

Hops = 9 (consistently)

no reordering

 

 

Best regards,

Ivan

 

 

 

 

From: [mailto:] On Behalf Of Szymon Trocha
Sent: Freitag, 10. Juli 2015 08:21
To: Manglos, Andrew P (173E)
Cc:
Subject: Re: [perfsonar-user] perfSonar - Internal Server Error

 

W dniu 2015-07-10 o 01:29, Manglos, Andrew P (173E) pisze:

Hello,

 

I’m getting an Internal Server Error 500 when trying to web to a perfSonar box. I looked at the error log and see:

 

[Thu Jul 09 19:25:03 2015] [error] [client 137.78.171.89] (13)Permission denied: exec of '/opt/perfsonar_ps/toolkit/web/root/index.cgi' failed

[Thu Jul 09 19:25:03 2015] [error] [client 137.78.171.89] Premature end of script headers: index.cgi

 

I check permissions on the file:

 

lrwxrwxrwx. 1 perfsonar perfsonar 22 Jul  9 19:10 /opt/perfsonar_ps/toolkit/web/root/index.cgi -> gui/services/index.cgi

 

and the file that points to:

 

 

-rwxr-xr-x. 1 perfsonar perfsonar 9892 Jun 22 15:06 /opt/perfsonar_ps/toolkit/web/root/gui/services/index.cgi

 

 

Can anyone tell me what is wrong? Or lead me in the correct direction?


Hi Andrew,

Is your behaviour similar to http://www.perfsonar.net/about/faq/#Q82 ?
If yes, check the SELinux settings in /etc/sysconfig/selinux. If it is set to enforcing, consider setting to permissive and rebooting.
If not helping let us know with more details from Apache log.

Regards,

-- 
Szymon Trocha
 
Poznań Supercomputing & Netw. Center ::: NETWORK OPERATION CENTER
Tel. +48 618582022 ::: http://noc.man.poznan.pl

 

 




Archive powered by MHonArc 2.6.16.

Top of Page