perfsonar-user - RE: [perfsonar-user] regular_testing service errors
Subject: perfSONAR User Q&A and Other Discussion
List archive
- From: "Garnizov, Ivan (RRZE)" <>
- To: "Garnizov, Ivan (RRZE)" <>
- Cc: "" <>
- Subject: RE: [perfsonar-user] regular_testing service errors
- Date: Tue, 14 Jul 2015 14:13:15 +0000
- Accept-language: en-GB, de-DE, en-US
Hi Shawn, Andy, Dan, The limitation of 2GB RAM is gone on the test instance with 3.4.2 and yet the issue with regulat testing remains. [root@test-rhps02 ~]# ps auxw | grep owampd owamp 2408 0.0 0.0 7272 688 ? Ss 09:24 0:05 /usr/bin/owampd -c /etc/owampd -R /var/run owamp 61045 0.0 0.0 7484 776 ? S 13:55 0:00 /usr/bin/owampd -c /etc/owampd -R /var/run owamp 61059 0.0 0.0 7484 768 ? S 13:55 0:00 /usr/bin/owampd -c /etc/owampd -R /var/run owamp 61164 0.0 0.0 7484 764 ? S 13:56 0:00 /usr/bin/owampd -c /etc/owampd -R /var/run owamp 61165 0.0 0.0 7484 484 ? S 13:56 0:00 /usr/bin/owampd -c /etc/owampd -R /var/run owamp 61187 0.0 0.0 7484 760 ? S 13:56 0:00 /usr/bin/owampd -c /etc/owampd -R /var/run owamp 61188 0.0 0.0 7484 408 ? S 13:56 0:00 /usr/bin/owampd -c /etc/owampd -R /var/run owamp 61310 0.0 0.0 7484 764 ? S 13:57 0:00 /usr/bin/owampd -c /etc/owampd -R /var/run owamp 61314 0.0 0.0 7484 468 ? S 13:57 0:00 /usr/bin/owampd -c /etc/owampd -R /var/run owamp 61328 0.0 0.0 7484 760 ? S 13:57 0:00 /usr/bin/owampd -c /etc/owampd -R /var/run owamp 61329 0.0 0.0 7484 400 ? S 13:57 0:00 /usr/bin/owampd -c /etc/owampd -R /var/run owamp 61335 0.0 0.0 7500 796 ? S 13:57 0:00 /usr/bin/owampd -c /etc/owampd -R /var/run owamp 61386 0.0 0.0 7620 884 ? S 13:58 0:00 /usr/bin/owampd -c /etc/owampd -R /var/run owamp 61390 0.0 0.0 7620 580 ? S 13:58 0:00 /usr/bin/owampd -c /etc/owampd -R /var/run owamp 61421 0.0 0.0 7620 884 ? S 13:58 0:00 /usr/bin/owampd -c /etc/owampd -R /var/run owamp 61423 0.0 0.0 7620 580 ? S 13:58 0:00 /usr/bin/owampd -c /etc/owampd -R /var/run root 61425 0.0 0.0 103256 944 pts/0 S+ 13:58 0:00 grep owampd I also noticed that the same remark about scheduled owamp tests, Andy did to Casey, applies to my case. I can only ask that the sample_count value is increased
for the mesh for perfSONAR Software Testing in Indiana University by Dan. Best regards, Ivan From: [mailto:]
On Behalf Of Andrew Lake Hi, It looks like you have owamp configured to send 100 packets per second and register results every 300 packets (3 seconds). I believe OWAMP won’t let you actually do such a short reporting interval and will bump it up to something like 15
seconds. Unfortunately the regular_testing doesn’t know it did this, so when it doesn’t get results for 3x the specified reporting interval (9 seconds) it assumes it timed-out and restarts the process. I would recommend increasing the packet count from 300 to something like 6000 (every 60 seconds). That’s generally the time interval we use for reporting owamp summaries. Let me know if you have any questions. Thanks, Andy From: [mailto:]
On Behalf Of Garnizov, Ivan (RRZE) Hi Shawn, Thanks for pointing out the problem with ntp. In fact that immediately made me realize that there are firewall changes coming from the upgrade as well. Out of curiosity why the ntp.conf is being updated during the upgrade process and all of my ntp servers are replaced? At this very moment Puppet restored the ntp config and the system is synced. About the 2G of RAM, you are right, but since this is a testing instance it participates in a mesh with less than 10 other servers. I would guess even with such low parameters at least some tests would succeed. Anyway I will ask for an upgrade. Best regards, Ivan From: Shawn McKee []
Hi Ivan,
1) It only has 2GB of RAM but for v3.4 the minimum recommended is 4GB. This can cause problems, especially if you have more than a few tests ongoing. 2) Your host is not NTP synced right now. See http://psmp-tst-02.dub.ie.geant.net/toolkit/ or http://psmp-tst-02.dub.ie.geant.net/toolkit/?format=json Shawn On Mon, Jul 13, 2015 at 10:08 AM, Garnizov, Ivan (RRZE) <> wrote: Hi guys, Andy, The problem persists. I have made more diagnostic tests. First of all I decided to see if something prevents the powstream to operate, so I sniffed traffic
and I believe the communication between the test hosts is good. Then I decided to revert the changes, so….we restored from a snapshot. This time I have disabled
our Puppet, just to make sure the system is untouched after the upgrade. I have also let the old version play for a while. It worked fine and measurements are even recorded. This time I also captured the process of the upgrade. Now with Puppet disabled, with /var/lib /perfsonar/regular_testing/* cleaned up…..the problem
persists. I am able to run tests on the command line, but the error bellow continuously reappears to the
log, while the service obviously is able to create and manage the folders stated: 2015/07/13 14:36:20 (62500) ERROR> CmdRunner.pm:148 perfSONAR_PS::RegularTesting::Utils::CmdRunner::run - Command
exited, will restart in 278 seconds : /usr/bin/powstream -4 -p -d /var/lib/perfsonar/regular_testing/owamp_BBjps -c 300 -i 0.01 -S
psmp-tst-02.dub.ie.geant.net -t
ps-test.ctc.grnoc.iu.edu 2015/07/13 14:36:20 (62500) ERROR> CmdRunner.pm:148 perfSONAR_PS::RegularTesting::Utils::CmdRunner::run - Command
exited, will restart in 278 seconds : /usr/bin/powstream -4 -p -d /var/lib/perfsonar/regular_testing/owamp_Te8FR -c 300 -i 0.01 -S
psmp-tst-02.dub.ie.geant.net -t
perfsonar-dev5.grnoc.iu.edu 2015/07/13 14:36:20 (62500) ERROR> CmdRunner.pm:148 perfSONAR_PS::RegularTesting::Utils::CmdRunner::run - Command
exited, will restart in 278 seconds : /usr/bin/powstream -4 -p -d /var/lib/perfsonar/regular_testing/owamp_2JYwP -c 300 -i 0.01 -S
psmp-tst-02.dub.ie.geant.net -t
ps-deb.es.net 2015/07/13 14:36:20 (62500) ERROR> CmdRunner.pm:148 perfSONAR_PS::RegularTesting::Utils::CmdRunner::run - Command
exited, will restart in 278 seconds : /usr/bin/powstream -4 -p -d /var/lib/perfsonar/regular_testing/owamp_R4S6A -c 300 -i 0.01 -S
psmp-tst-02.dub.ie.geant.net -t
antg-dev.es.net 2015/07/13 14:36:20 (62500) ERROR> CmdRunner.pm:148 perfSONAR_PS::RegularTesting::Utils::CmdRunner::run - Command
exited, will restart in 278 seconds : /usr/bin/powstream -4 -p -d /var/lib/perfsonar/regular_testing/owamp_KmjKF -c 300 -i 0.01 -S
psmp-tst-02.dub.ie.geant.net -t
perfsonardev0.internet2.edu 2015/07/13 14:36:20 (62500) ERROR> CmdRunner.pm:148 perfSONAR_PS::RegularTesting::Utils::CmdRunner::run - Command
exited, will restart in 278 seconds : /usr/bin/powstream -4 -p -d /var/lib/perfsonar/regular_testing/owamp_v8K1j -c 300 -i 0.01 -S
psmp-tst-02.dub.ie.geant.net -t
ma-dev2.bldc.grnoc.iu.edu There are some warnings from the upgrade process: warning: /etc/cassandra/default.conf/cassandra-env.sh created as /etc/cassandra/default.conf/cassandra-env.sh.rpmnew warning: /opt/esmond/esmond.conf created as /opt/esmond/esmond.conf.rpmnew warning: /opt/esmond/esmond/settings.py created as /opt/esmond/esmond/settings.py.rpmnew New python executable in ./bin/python Installing Setuptools.............................................................................................done. Installing Pip....................................................................................................................................done. Creating tables ... Creating table ps_networkelement_subject Creating table useripaddress Installing custom SQL ... Installing indexes ... Installed 0 object(s) from 0 fixture(s) Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/add_dbxml_path upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/add_sbin_path upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/add_toolkit_dirs_path upgrade 3.4.1 1.pSPS 3.4.2
13.pSPS Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/bwctl_port_verify upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/configure_bwctld_log_location upgrade 3.4.1 1.pSPS
3.4.2 13.pSPS Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/configure_esmond upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS New python executable in ./bin/python Installing Setuptools.............................................................................................done. Installing Pip....................................................................................................................................done. Creating tables ... Installing custom SQL ... Installing indexes ... Installed 0 object(s) from 0 fixture(s) User perfsonar exists Setting timeseries permissions. User perfsonar already has api key, skipping creation Key: ------------------------------------- for perfsonar Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/configure_fail2ban upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/configure_firewall upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS Adding perfSONAR firewall rules iptables: Saving firewall rules to /etc/sysconfig/iptables: [ OK ] ip6tables: Saving firewall rules to /etc/sysconfig/ip6tables: [ OK ] Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/configure_ntpd upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/configure_regular_testing upgrade 3.4.1 1.pSPS 3.4.2
13.pSPS New python executable in ./bin/python Installing Setuptools.............................................................................................done. Installing Pip....................................................................................................................................done. Creating tables ... Installing custom SQL ... Installing indexes ... Installed 0 object(s) from 0 fixture(s) User perfsonar exists Setting timeseries permissions. User perfsonar already has api key, skipping creation Key: ----------------------------------------------- for perfsonar No tests to upgrade Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/configure_sysctl upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/configure_syslog_local5_location upgrade 3.4.1 1.pSPS
3.4.2 13.pSPS Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/disable_http_trace upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/disable_mysql_network_access upgrade 3.4.1 1.pSPS
3.4.2 13.pSPS Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/disable_php_advertising upgrade 3.4.1 1.pSPS 3.4.2
13.pSPS Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/disable_unwanted_services upgrade 3.4.1 1.pSPS 3.4.2
13.pSPS Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/disable_weak_ssl_ciphers upgrade 3.4.1 1.pSPS 3.4.2
13.pSPS Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/disable_zeroconf upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/enable_apache_redirect upgrade 3.4.1 1.pSPS 3.4.2
13.pSPS Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/enable_auto_updates upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/enable_mysqld upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/enable_nscd upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/enable_ntpd upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/enable_web100_kernel_repository upgrade 3.4.1 1.pSPS
3.4.2 13.pSPS Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/enable_wheel_sudo upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/increase_owamp_limits upgrade 3.4.1 1.pSPS 3.4.2
13.pSPS Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/increase_owamp_port_range upgrade 3.4.1 1.pSPS 3.4.2
13.pSPS Running: /opt/perfsonar_ps/toolkit/scripts/system_environment/upgrade_apache upgrade 3.4.1 1.pSPS 3.4.2 13.pSPS Best regards, Ivan From: Shawn McKee [mailto:]
Hi Ivan,
What does this command show? du -hs /var/lib/perfsonar/regular_testing (How much is there?) If there is more than about 15 MB you may want to clean it up and reboot: rm -rf /var/lib/perfsonar/regular_testing/* reboot Shawn On Fri, Jul 10, 2015 at 10:25 AM, Garnizov, Ivan (RRZE) <> wrote: Dear perfSONAR developers, I have a strange case where a system that had been upgraded from 3.4.1 to 3.4.2 started experiencing errors on scheduled tests (regular_testing). I have tried restarting the service, stopping /starting the service with killing all the powstream and bwctl processes, finally I restarted the server…but
the problem persists. ERROR> CmdRunner.pm:148 perfSONAR_PS::RegularTesting::Utils::CmdRunner::run - Command exited, will restart in 278 seconds : /usr/bin/powstream -4 -p
-d /var/lib/perfsonar/regular_testing/owamp_sKeW0 -c 300 -i 0.01 -S
psmp-tst-02.dub.ie.geant.net -t
ma-dev2.bldc.grnoc.iu.edu I am applying
logs, current state of folder permissions and regular_testing.conf There are logs from previous state when it was OK and the state after the upgrade. Manual tests after the upgrade are successful: [dfn.garnizov@test-rhps02 ~]$ owping -c 300 -i 0.01 -S
psmp-tst-02.dub.ie.geant.net -t
ps-test.ctc.grnoc.iu.edu Approximately 6.8 seconds until results available --- owping statistics from [psmp-tst-02.dub.ie.geant.net]:8847 to [ps-test.ctc.grnoc.iu.edu]:9334
--- SID: 8cb62c5cd94a3e09053202dc3daf03e1 first: 2015-07-10T12:50:18.423 last: 2015-07-10T12:50:21.388 300 sent, 0 lost (0.000%), 0 duplicates one-way delay min/median/max = 51.6/51.9/52.2 ms, (err=11.9 ms) one-way jitter = 0.2 ms (P95-P50) Hops = 9 (consistently) no reordering Best regards, Ivan From:
[mailto:]
On Behalf Of Szymon Trocha W dniu 2015-07-10 o 01:29, Manglos, Andrew P (173E) pisze:
--
Szymon Trocha
Poznań Supercomputing & Netw. Center ::: NETWORK OPERATION CENTER
Tel. +48 618582022 ::: http://noc.man.poznan.pl
|
- [perfsonar-user] regular_testing service errors, Garnizov, Ivan (RRZE), 07/13/2015
- Re: [perfsonar-user] regular_testing service errors, Shawn McKee, 07/13/2015
- RE: [perfsonar-user] regular_testing service errors, Garnizov, Ivan (RRZE), 07/13/2015
- RE: [perfsonar-user] regular_testing service errors, Garnizov, Ivan (RRZE), 07/14/2015
- RE: [perfsonar-user] regular_testing service errors, Garnizov, Ivan (RRZE), 07/13/2015
- Re: [perfsonar-user] regular_testing service errors, Shawn McKee, 07/13/2015
Archive powered by MHonArc 2.6.16.