Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] Lost all Owamp testing on Thursday at 1:00am

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] Lost all Owamp testing on Thursday at 1:00am


Chronological Thread 
  • From: "Andrew Lake" <>
  • To: "Casey Russell" <>
  • Cc:
  • Subject: Re: [perfsonar-user] Lost all Owamp testing on Thursday at 1:00am
  • Date: Mon, 13 Jul 2015 07:41:14 -0700 (PDT)

Hi Casey,

The 1AM time corresponds to the nightly restart time of owampd and regular_testing daemons. Auto-updates happen at random times so i don’t think it’s that. Can you verify the version of regular testing: “yum info perl-perfSONAR_PS-RegularTesting”? It should be at version 3.4.2-5. That particular version contains a fix for similar problems around restart times, so making sure that’s installed is the first step. It’s been out for a few weeks.

If that is latest, when your hosts are in a bad state do they have owampd proceeses ("ps auxw | grep owampd”) and powstream ("ps auxw | grep owampd”) running? Could you also send /var/log/perfsonar/owamp_bwctl.log and /var/log/perfsonar/regular_testing.log?

Thanks,
Andy





On Mon, Jul 13, 2015 at 10:04 AM, Casey Russell <> wrote:

Group,

     I have a mesh of 4 PerfSonar nodes (and 1 collector) in a mesh.  last Thursday morning (9th) at 1:00am, Owamp collection for half of the hosts in the mesh stopped.  I'm still collecting bandwidth data, and traceroute data, just no Owamp.  at around 1:00am on Friday, the other half stopped.

     I can manually run latency testing between hosts using bwctl using either ping or owamp as the tool with no problems.  However, in the maddash interfaces,  if you try to look at the details for the recent tests you get:

"Unable to find any tests in the given time range where....."

I've re-pulled the mesh config on all the testing hosts.  I've restarted the regular testing daemon on all the testing hosts.  I've restarted the local latency services.  I eventually restarted the hosts.  All to no effect.

Since it occurred on two consecutive nights at 1:00am I suspected it was an automatic update that caused the problem.  So I checked the IPtables rules that were borked in the original 3.4x release.  Although they've been all re-written since my original installs, they seem legitimate. 

I did have to make changes to the /etc/httpd/conf.d/apache-toolkit_web_gui.conf to re-enable our Radius authentication for the web interface on the boxes since a recent update had overwritten it.  But besides that, I'm unable to find anything in my own looking that would have caused this problem.

I'm going to need help from someone that knows PS a lot better than I.  I'll be happy to share any log files you need to help things along.  The dashboard is at:  http://ps-dashboard.perfsonar.kanren.net/maddash-webui/ 

Thank you,



Casey Russell
Network Engineer
Kansas Research and Education Network

2029 Becker Drive, Suite 282

Lawrence, KS  66047

(785)856-9820  ext 9809




Archive powered by MHonArc 2.6.16.

Top of Page