perfsonar-user - Re: [perfsonar-user] Lost all Owamp testing on Thursday at 1:00am
Subject: perfSONAR User Q&A and Other Discussion
List archive
- From: "Andrew Lake" <>
- To: "Casey Russell" <>
- Cc:
- Subject: Re: [perfsonar-user] Lost all Owamp testing on Thursday at 1:00am
- Date: Mon, 13 Jul 2015 10:25:47 -0700 (PDT)
Hi,
It looks like you have owamp configured to send 100 packets per second and register results every 300 packets (3 seconds). I believe OWAMP won’t let you actually do such a short reporting interval and will bump it up to something like 15 seconds. Unfortunately the regular_testing doesn’t know it did this, so when it doesn’t get results for 3x the specified reporting interval (9 seconds) it assumes it timed-out and restarts the process.
I would recommend increasing the packet count from 300 to something like 6000 (every 60 seconds). That’s generally the time interval we use for reporting owamp summaries. Let me know if you have any questions.
Thanks,
Andy
On Mon, Jul 13, 2015 at 12:09 PM, Casey Russell <> wrote:
They now fit in Gmail's mouth nicely.Andy,You know what? On further reflection, that was a silly plan. Linking to those enormous log files is likely to detonate most any browser via a memory overload. I've attached some abbreviated versions of the log files you requested. I just cut them down so that they don't go back as far to reduce the size.Casey RussellNetwork EngineerKansas Research and Education Network2029 Becker Drive, Suite 282
Lawrence, KS 66047
(785)856-9820 ext 9809On Mon, Jul 13, 2015 at 10:27 AM, Casey Russell <> wrote:http://ps-bryant-lt.perfsonar.kanren.net/toolkit/regular_testing.logAs for the logs, gmail didn't want to let me attach them since they're rather large, so I've temporarily moved copies to the publicly available root area of the web server on one of the affected hosts. So you can see the relevant logs for one of the affected hosts at:****dozens more similar processes clipped for brevity*******And they do have owamp processes running even though no data is being collected: (see clipped output below). Some boxes have dozens, some have only one or two. But there's always at least one owamp process present that was started with the command line: /usr/bin/owampd -c /etc/owampd -R /var/runThe testing hosts all appear to be running RegularTesting version 3.4.2 release 5 (see output below)Andy,Thank you for the quick reply.
Installed Packages
Name : perl-perfSONAR_PS-RegularTesting
Arch : noarch
Version : 3.4.2
Release : 5.pSPS
Size : 285 k
Repo : installed
From repo : Internet2
[crussell@ps-bryant-bw ~]$ sudo ps auxw | grep owampd
owamp 2149 0.0 0.0 7272 688 ? Ss 09:24 0:00 /usr/bin/owampd -c /etc/owampd -R /var/run
owamp 3384 0.0 0.0 7484 772 ? S 09:39 0:00 /usr/bin/owampd -c /etc/owampd -R /var/run
owamp 3386 0.0 0.0 7484 768 ? S 09:39 0:00 /usr/bin/owampd -c /etc/owampd -R /var/run
owamp 4590 0.0 0.0 7484 776 ? S 09:49 0:00 /usr/bin/owampd -c /etc/owampd -R /var/run
owamp 4613 0.0 0.0 7484 768 ? S 09:49 0:00 /usr/bin/owampd -c /etc/owampd -R /var/run
http://ps-bryant-lt.perfsonar.kanren.net/toolkit/owamp_bwctl.logKeep in mind, these logs are from one of the affected testing hosts, if you need to see anything from the central archive host, let me know and I'll get that to you as well.Thank you again for any help you can lend.Casey RussellNetwork EngineerKansas Research and Education Network2029 Becker Drive, Suite 282
Lawrence, KS 66047
On Mon, Jul 13, 2015 at 9:41 AM, Andrew Lake <> wrote:Hi Casey,The 1AM time corresponds to the nightly restart time of owampd and regular_testing daemons. Auto-updates happen at random times so i don’t think it’s that. Can you verify the version of regular testing: “yum info perl-perfSONAR_PS-RegularTesting”? It should be at version 3.4.2-5. That particular version contains a fix for similar problems around restart times, so making sure that’s installed is the first step. It’s been out for a few weeks.If that is latest, when your hosts are in a bad state do they have owampd proceeses ("ps auxw | grep owampd”) and powstream ("ps auxw | grep owampd”) running? Could you also send /var/log/perfsonar/owamp_bwctl.log and /var/log/perfsonar/regular_testing.log?Thanks,AndyOn Mon, Jul 13, 2015 at 10:04 AM, Casey Russell <> wrote:
"Unable to find any tests in the given time range where....."I can manually run latency testing between hosts using bwctl using either ping or owamp as the tool with no problems. However, in the maddash interfaces, if you try to look at the details for the recent tests you get:Group,I have a mesh of 4 PerfSonar nodes (and 1 collector) in a mesh. last Thursday morning (9th) at 1:00am, Owamp collection for half of the hosts in the mesh stopped. I'm still collecting bandwidth data, and traceroute data, just no Owamp. at around 1:00am on Friday, the other half stopped.I've re-pulled the mesh config on all the testing hosts. I've restarted the regular testing daemon on all the testing hosts. I've restarted the local latency services. I eventually restarted the hosts. All to no effect.Since it occurred on two consecutive nights at 1:00am I suspected it was an automatic update that caused the problem. So I checked the IPtables rules that were borked in the original 3.4x release. Although they've been all re-written since my original installs, they seem legitimate.
I did have to make changes to the /etc/httpd/conf.d/apache-toolkit_web_gui.conf to re-enable our Radius authentication for the web interface on the boxes since a recent update had overwritten it. But besides that, I'm unable to find anything in my own looking that would have caused this problem.I'm going to need help from someone that knows PS a lot better than I. I'll be happy to share any log files you need to help things along. The dashboard is at: http://ps-dashboard.perfsonar.kanren.net/maddash-webui/Thank you,Casey RussellNetwork EngineerKansas Research and Education Network2029 Becker Drive, Suite 282
Lawrence, KS 66047
<owamp_bwctl.log> <regular_testing.log>
- [perfsonar-user] Lost all Owamp testing on Thursday at 1:00am, Casey Russell, 07/13/2015
- Re: [perfsonar-user] Lost all Owamp testing on Thursday at 1:00am, Andrew Lake, 07/13/2015
- Re: [perfsonar-user] Lost all Owamp testing on Thursday at 1:00am, Casey Russell, 07/13/2015
- Re: [perfsonar-user] Lost all Owamp testing on Thursday at 1:00am, Casey Russell, 07/13/2015
- Re: [perfsonar-user] Lost all Owamp testing on Thursday at 1:00am, Andrew Lake, 07/13/2015
- Re: [perfsonar-user] Lost all Owamp testing on Thursday at 1:00am, Casey Russell, 07/14/2015
- Re: [perfsonar-user] Lost all Owamp testing on Thursday at 1:00am, Andrew Lake, 07/14/2015
- Re: [perfsonar-user] Lost all Owamp testing on Thursday at 1:00am, Casey Russell, 07/14/2015
- Re: [perfsonar-user] Lost all Owamp testing on Thursday at 1:00am, Andrew Lake, 07/13/2015
- Re: [perfsonar-user] Lost all Owamp testing on Thursday at 1:00am, Casey Russell, 07/13/2015
- Re: [perfsonar-user] Lost all Owamp testing on Thursday at 1:00am, Casey Russell, 07/13/2015
- Re: [perfsonar-user] Lost all Owamp testing on Thursday at 1:00am, Andrew Lake, 07/13/2015
Archive powered by MHonArc 2.6.16.