Skip to Content.
Sympa Menu

perfsonar-user - Re: [perf-node-users] [perfsonar-user] Regular Testing services showing "Not Running"

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perf-node-users] [perfsonar-user] Regular Testing services showing "Not Running"


Chronological Thread 
  • From: Soichi Hayashi <>
  • To: "" <>, Performance Node Users <>
  • Subject: Re: [perf-node-users] [perfsonar-user] Regular Testing services showing "Not Running"
  • Date: Mon, 30 Sep 2013 13:58:11 -0400

OK. I haven't had a chance to go through FAQ #54, but here is the state of things as of now..

1. The "Traceroute Regular Testing" is showing "Running". I guess something got cleared during the weekend.
2. MyOSG's full matrix view is still showing all "NA" for reverse direction... even though one-way latency tests are all active with bidirectional set to yes.

3. Still don't see any active test under Throughput service / active tests.
4. Consequently, MyOSG's full matrix view is showing "NA" for all except reverse direction for perfsonar02.discovery.wisc.edu

By the way, I have disabled local iptable on both perfsonar-lt.grid.iu.edu and perfsonar-bw.grid.iu.edu.

I am not sure what to troubleshoot next (other than the FAQ54.. which I don't quite see any relevance to the issue I am having to be honest).. 

Soichi



On Fri, Sep 27, 2013 at 5:38 PM, Jason Zurawski <> wrote:
Hi Soichi;

Answers inline:

On Sep 27, 2013, at 4:59 PM, Soichi Hayashi <> wrote:

> I believe bwmaster processes are already running
>
> 2013-09-27 20:37:29 UTC [root@perfsonar-bw:/opt/perfsonar_ps/traceroute_ma/etc]# ps -ef | grep buoy
> 497       2257     1  0 19:32 ?        00:00:00 /opt/perfsonar_ps/perfsonarbuoy_ma/bin/bwcollector.pl:master
> 497       2264     1 99 19:32 ?        01:04:15 /opt/perfsonar_ps/perfsonarbuoy_ma/bin/bwmaster.pl:master
>
> But, I've restarted anyway.
>
> 2013-09-27 20:39:39 UTC [root@perfsonar-bw:/opt/perfsonar_ps/traceroute_ma/etc]# /etc/init.d/perfsonarbuoy_bw_master start
> perfSONAR-BUOY BWCTL Measurement Service Started
>
> > That may give a clue if the they don't start/stop cleanly.
> So no clue here?
>
> By the way, the init script said "perfSONAR-BUOY BWCTL Measurement Service Stopped", but I couldn't start it back again, due to "bwmaster.pl:24644 still running...".. I had to do kill -9 on the master, then start again. Something wrong with the init script?

This is what I meant about zombies being stuck around - you may want to follow all of the steps in that FAQ entry to be completely sure nothing is stuck.

> Anway, I still see nothing under "Active" throughput tests after the restart. I've also rebooted the machine, with no change.

You won't see instantaneous results - depending on the cadence of your BWCTL tests it may take hours.

> > Have you configured regular traceroute tests?  If you haven't, then it will show as not running.
> I see traceroute tests listed in OSG mesh config (http://myosg.grid.iu.edu/pfmesh/json) for perfsonar-lt, and both perfsonar-bw and -lt insances uses this config in mesh  agent config. Do you mean to I need to do something else to get this configured?

Are both servers participating in the traceroute testing, or just one?  Whichever hosts are supposed to be doing the tests, please send the latest Traceroute logs from /var/log/perfsonar, someone can take a look to see if we see any errors.

> > There is a longer write up here on lots of other fun steps:
> http://psps.perfsonar.net/toolkit/FAQs.html#Q54
> > Q:My (OWAMP|BWCTL) measurements have stopped, and I notice mysql errors in the logs. What should I do?
>
> So you want me to go through this mysql troubleshooting? Or.. do you mean to read through all FAQs? By the way, my MySQL DB *did* crash in the past, and I have followed similar steps to recover (actually I think I've re built from scratch at least once since this happened).

Do all of the steps in #54 only, you do not need every single FAQ item:

http://psps.perfsonar.net/toolkit/FAQs.html#Q54

Thanks;

-jason

> Soichi
>
> On Fri, Sep 27, 2013 at 4:00 PM, Jason Zurawski <> wrote:
> Hi Soichi;
>
> To answer your questions:
>
> >> 1) When I go to throughput service page > https://perfsonar-bw.grid.iu.edu/serviceTest/index.cgi?eventType=bwctl
> >> I see no entry under "Active Tests" for perfsonar-bw. Do you know how to enable all inactive tests?
>
> Are the bwcollector an bwmaster processes running, if not start them (or perhaps just restart them completely):
>
> sudo /etc/init.d/perfsonarbuoy_bw_collector restart
> sudo /etc/init.d/perfsonarbuoy_bw_master restart
>
> That may give a clue if the they don't start/stop cleanly.  Sometimes iperf zombies stick around, and you need to go kill them (a likely problem).  There is a longer write up here on lots of other fun steps:
>
> http://psps.perfsonar.net/toolkit/FAQs.html#Q54
>
> >> 2) For both -bw and -lt instances, I still see "Traceroute Regular Testing" showing "Not Running". Do I need to worry?
>
> Have you configured regular traceroute tests?  If you haven't, then it will show as not running. If you have, you may need to follow a similar step to link I sent above (just for the traceroute tools).
>
> Thanks;
>
> -jason
>
> On Sep 27, 2013, at 3:45 PM, Soichi Hayashi <> wrote:
>
> > I see.. I did following.
> >
> > 2013-09-27 19:31:07 UTC [root@perfsonar-bw:/var/lib]# ls -la perfsonar
> > lrwxrwxrwx 1 root root 20 Sep 27 19:31 perfsonar -> /usr/local/perfsonar
> > (on both perfsonar-bw and perfsonar-lt)
> >
> > Reverted config change, rebooted them, and now I seeing "Running" next to perfsonar BUOY regular testing.
> >
> > I still have 2 issues, however,
> >
> > 1) When I go to throughput service page > https://perfsonar-bw.grid.iu.edu/serviceTest/index.cgi?eventType=bwctl
> > I see no entry under "Active Tests" for perfsonar-bw. Do you know how to enable all inactive tests?
> >
> > 2) For both -bw and -lt instances, I still see "Traceroute Regular Testing" showing "Not Running". Do I need to worry?
> >
> > Soichi
> >
> >
> > On Fri, Sep 27, 2013 at 3:27 PM, Jason Zurawski <> wrote:
> > Hi Soichi;
> >
> > Correct, if you changed the defaults the web page is unlikely to work.  There are two options:
> >
> >  - Edit the web page to point to the new locations, on a regular pS Performance Toolkit it is located here (this may be different for your custom setup): /opt/perfsonar_ps/toolkit/web/root/gui/services/index.cgi
> >
> >  - Symlink the /var locations to the /usr/local locations that you are using.
> >
> > Thanks;
> >
> > -jason
> >
> > On Sep 27, 2013, at 3:18 PM, Soichi Hayashi <> wrote:
> >
> > > Jason,
> > >
> > > I have following in the /opt/perfsonar_ps/perfsonarbuoy_ma/etc
> > > (perfsonar-bw)
> > > > BWDataDir       /usr/local/perfsonar/perfsonarbuoy_ma/bwctl
> > >
> > > (perfsonar-lt)
> > > > OWPDataDir      /usr/local/perfsonar/perfsonarbuoy_ma/owamp
> > >
> > > These re-configuration were needed to prevent perfsonar from running out of disk on the default /var partition.
> > >
> > > The config creates pid file in following location (for perfsonar-bw)
> > > > /usr/local/perfsonar/perfsonarbuoy_ma/bwctl/bwmaster.pid
> > > which contains a correct PID
> > >
> > > 2013-09-27 19:11:47 UTC [root@perfsonar-bw:/usr/local/perfsonar/perfsonarbuoy_ma/bwctl]# cat bwmaster.pid
> > > 1850
> > > 2013-09-27 19:11:50 UTC [root@perfsonar-bw:/usr/local/perfsonar/perfsonarbuoy_ma/bwctl]# ps - grep 1850
> > > 497       1850     1 99 17:44 ?        01:27:31 /opt/perfsonar_ps/perfsonarbuoy_ma/bin/bwmaster.pl:master
> > >
> > > I am guessing that.. web interface is using the default pid location to look for the PID file, and incorrectly determining that the process are not running?
> > >
> > > Thanks!
> > > Soichi
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Fri, Sep 27, 2013 at 12:09 PM, Jason Zurawski <> wrote:
> > > Hi Soichi;
> > >
> > > The local services page is designed to look for PID files and process names for the various services.  In the case of Latency testing:
> > >
> > > > my $pSB_owamp_master_pid        = "/var/lib/perfsonar/perfsonarbuoy_ma/owamp/powmaster.pid";
> > > > my $pSB_owamp_master_pname      = "powmaster";
> > > > my $pSB_owamp_collector_pid     = "/var/lib/perfsonar/perfsonarbuoy_ma/owamp/upload/powcollector.pid";
> > > > my $pSB_owamp_collector_pname   = "powcollector";
> > >
> > > And in the case of Bandwidth testing:
> > >
> > > > my $pSB_bwctl_master_pid    = "/var/lib/perfsonar/perfsonarbuoy_ma/bwctl/bwmaster.pid";
> > > > my $pSB_bwctl_master_pname      = "bwmaster";
> > > > my $pSB_bwctl_collector_pid     = "/var/lib/perfsonar/perfsonarbuoy_ma/bwctl/upload/bwcollector.pid";
> > > > my $pSB_bwctl_collector_pname   = "bwcollector";
> > >
> > > The check will go into 'not running' if there are no PID files, or the PID in the file doesn't match the process name that is currently running.  That would be the first place to look (and naturally the old fashioned 'reboot' has been known to fix this).
> > >
> > > With regards to your Bandwidth test machine - I don't see any tests in an active state via the results page, so that may be additional troubleshooting you will want to do.
> > >
> > > Thanks;
> > >
> > > -jason
> > >
> > > On Sep 27, 2013, at 11:38 AM, Soichi Hayashi <> wrote:
> > >
> > > > Hello.
> > > >
> > > > I have following perfsonar instances (installed via RPM)
> > > >
> > > > > http://perfsonar-lt.grid.iu.edu/toolkit/
> > > > > http://perfsonar-bw.grid.iu.edu/toolkit/
> > > >
> > > > For -lt, I see following
> > > > perfSONAR-BUOY Regular Testing (One-Way Latency)[1]   Not Running
> > > > And on -bw instance, I see following
> > > > perfSONAR-BUOY Regular Testing (Throughput)[1]        Not Running
> > > > I am not sure if these message are red-herring, but these services are enabled (via UI) and collectors are running, for example for -lt
> > > >
> > > > 2013-09-27 15:36:49 UTC [root@perfsonar-lt:/var/log/perfsonar]# ps -ef | grep colle
> > > > 497        485   511  3 15:01 ?        00:01:24 /opt/perfsonar_ps/perfsonarbuoy_ma/bin/powcollector.pl:handle_req[129.79.53.52]
> > > > 497        511     1  0 01:06 ?        00:00:00 /opt/perfsonar_ps/perfsonarbuoy_ma/bin/powcollector.pl:master
> > > >
> > > > How can I troubleshoot this issue?
> > > > Soichi





Archive powered by MHonArc 2.6.16.

Top of Page