Skip to Content.
Sympa Menu

perfsonar-user - Re: [perf-node-users] [perfsonar-user] Regular Testing services showing "Not Running"

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perf-node-users] [perfsonar-user] Regular Testing services showing "Not Running"


Chronological Thread 
  • From: Jason Zurawski <>
  • To: Soichi Hayashi <>
  • Cc: , Performance Node Users <>
  • Subject: Re: [perf-node-users] [perfsonar-user] Regular Testing services showing "Not Running"
  • Date: Fri, 27 Sep 2013 17:38:20 -0400

Hi Soichi;

Answers inline:

On Sep 27, 2013, at 4:59 PM, Soichi Hayashi
<>
wrote:

> I believe bwmaster processes are already running
>
> 2013-09-27 20:37:29 UTC
> [root@perfsonar-bw:/opt/perfsonar_ps/traceroute_ma/etc]#
> ps -ef | grep buoy
> 497 2257 1 0 19:32 ? 00:00:00
> /opt/perfsonar_ps/perfsonarbuoy_ma/bin/bwcollector.pl:master
> 497 2264 1 99 19:32 ? 01:04:15
> /opt/perfsonar_ps/perfsonarbuoy_ma/bin/bwmaster.pl:master
>
> But, I've restarted anyway.
>
> 2013-09-27 20:39:39 UTC
> [root@perfsonar-bw:/opt/perfsonar_ps/traceroute_ma/etc]#
> /etc/init.d/perfsonarbuoy_bw_master start
> perfSONAR-BUOY BWCTL Measurement Service Started
>
> > That may give a clue if the they don't start/stop cleanly.
> So no clue here?
>
> By the way, the init script said "perfSONAR-BUOY BWCTL Measurement Service
> Stopped", but I couldn't start it back again, due to "bwmaster.pl:24644
> still running...".. I had to do kill -9 on the master, then start again.
> Something wrong with the init script?

This is what I meant about zombies being stuck around - you may want to
follow all of the steps in that FAQ entry to be completely sure nothing is
stuck.

> Anway, I still see nothing under "Active" throughput tests after the
> restart. I've also rebooted the machine, with no change.

You won't see instantaneous results - depending on the cadence of your BWCTL
tests it may take hours.

> > Have you configured regular traceroute tests? If you haven't, then it
> > will show as not running.
> I see traceroute tests listed in OSG mesh config
> (http://myosg.grid.iu.edu/pfmesh/json) for perfsonar-lt, and both
> perfsonar-bw and -lt insances uses this config in mesh agent config. Do
> you mean to I need to do something else to get this configured?

Are both servers participating in the traceroute testing, or just one?
Whichever hosts are supposed to be doing the tests, please send the latest
Traceroute logs from /var/log/perfsonar, someone can take a look to see if we
see any errors.

> > There is a longer write up here on lots of other fun steps:
> http://psps.perfsonar.net/toolkit/FAQs.html#Q54
> > Q:My (OWAMP|BWCTL) measurements have stopped, and I notice mysql errors
> > in the logs. What should I do?
>
> So you want me to go through this mysql troubleshooting? Or.. do you mean
> to read through all FAQs? By the way, my MySQL DB *did* crash in the past,
> and I have followed similar steps to recover (actually I think I've re
> built from scratch at least once since this happened).

Do all of the steps in #54 only, you do not need every single FAQ item:

http://psps.perfsonar.net/toolkit/FAQs.html#Q54

Thanks;

-jason

> Soichi
>
> On Fri, Sep 27, 2013 at 4:00 PM, Jason Zurawski
> <>
> wrote:
> Hi Soichi;
>
> To answer your questions:
>
> >> 1) When I go to throughput service page >
> >> https://perfsonar-bw.grid.iu.edu/serviceTest/index.cgi?eventType=bwctl
> >> I see no entry under "Active Tests" for perfsonar-bw. Do you know how to
> >> enable all inactive tests?
>
> Are the bwcollector an bwmaster processes running, if not start them (or
> perhaps just restart them completely):
>
> sudo /etc/init.d/perfsonarbuoy_bw_collector restart
> sudo /etc/init.d/perfsonarbuoy_bw_master restart
>
> That may give a clue if the they don't start/stop cleanly. Sometimes iperf
> zombies stick around, and you need to go kill them (a likely problem).
> There is a longer write up here on lots of other fun steps:
>
> http://psps.perfsonar.net/toolkit/FAQs.html#Q54
>
> >> 2) For both -bw and -lt instances, I still see "Traceroute Regular
> >> Testing" showing "Not Running". Do I need to worry?
>
> Have you configured regular traceroute tests? If you haven't, then it will
> show as not running. If you have, you may need to follow a similar step to
> link I sent above (just for the traceroute tools).
>
> Thanks;
>
> -jason
>
> On Sep 27, 2013, at 3:45 PM, Soichi Hayashi
> <>
> wrote:
>
> > I see.. I did following.
> >
> > 2013-09-27 19:31:07 UTC
> > [root@perfsonar-bw:/var/lib]#
> > ls -la perfsonar
> > lrwxrwxrwx 1 root root 20 Sep 27 19:31 perfsonar -> /usr/local/perfsonar
> > (on both perfsonar-bw and perfsonar-lt)
> >
> > Reverted config change, rebooted them, and now I seeing "Running" next to
> > perfsonar BUOY regular testing.
> >
> > I still have 2 issues, however,
> >
> > 1) When I go to throughput service page >
> > https://perfsonar-bw.grid.iu.edu/serviceTest/index.cgi?eventType=bwctl
> > I see no entry under "Active Tests" for perfsonar-bw. Do you know how to
> > enable all inactive tests?
> >
> > 2) For both -bw and -lt instances, I still see "Traceroute Regular
> > Testing" showing "Not Running". Do I need to worry?
> >
> > Soichi
> >
> >
> > On Fri, Sep 27, 2013 at 3:27 PM, Jason Zurawski
> > <>
> > wrote:
> > Hi Soichi;
> >
> > Correct, if you changed the defaults the web page is unlikely to work.
> > There are two options:
> >
> > - Edit the web page to point to the new locations, on a regular pS
> > Performance Toolkit it is located here (this may be different for your
> > custom setup): /opt/perfsonar_ps/toolkit/web/root/gui/services/index.cgi
> >
> > - Symlink the /var locations to the /usr/local locations that you are
> > using.
> >
> > Thanks;
> >
> > -jason
> >
> > On Sep 27, 2013, at 3:18 PM, Soichi Hayashi
> > <>
> > wrote:
> >
> > > Jason,
> > >
> > > I have following in the /opt/perfsonar_ps/perfsonarbuoy_ma/etc
> > > (perfsonar-bw)
> > > > BWDataDir /usr/local/perfsonar/perfsonarbuoy_ma/bwctl
> > >
> > > (perfsonar-lt)
> > > > OWPDataDir /usr/local/perfsonar/perfsonarbuoy_ma/owamp
> > >
> > > These re-configuration were needed to prevent perfsonar from running
> > > out of disk on the default /var partition.
> > >
> > > The config creates pid file in following location (for perfsonar-bw)
> > > > /usr/local/perfsonar/perfsonarbuoy_ma/bwctl/bwmaster.pid
> > > which contains a correct PID
> > >
> > > 2013-09-27 19:11:47 UTC
> > > [root@perfsonar-bw:/usr/local/perfsonar/perfsonarbuoy_ma/bwctl]#
> > > cat bwmaster.pid
> > > 1850
> > > 2013-09-27 19:11:50 UTC
> > > [root@perfsonar-bw:/usr/local/perfsonar/perfsonarbuoy_ma/bwctl]#
> > > ps - grep 1850
> > > 497 1850 1 99 17:44 ? 01:27:31
> > > /opt/perfsonar_ps/perfsonarbuoy_ma/bin/bwmaster.pl:master
> > >
> > > I am guessing that.. web interface is using the default pid location to
> > > look for the PID file, and incorrectly determining that the process are
> > > not running?
> > >
> > > Thanks!
> > > Soichi
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Fri, Sep 27, 2013 at 12:09 PM, Jason Zurawski
> > > <>
> > > wrote:
> > > Hi Soichi;
> > >
> > > The local services page is designed to look for PID files and process
> > > names for the various services. In the case of Latency testing:
> > >
> > > > my $pSB_owamp_master_pid =
> > > > "/var/lib/perfsonar/perfsonarbuoy_ma/owamp/powmaster.pid";
> > > > my $pSB_owamp_master_pname = "powmaster";
> > > > my $pSB_owamp_collector_pid =
> > > > "/var/lib/perfsonar/perfsonarbuoy_ma/owamp/upload/powcollector.pid";
> > > > my $pSB_owamp_collector_pname = "powcollector";
> > >
> > > And in the case of Bandwidth testing:
> > >
> > > > my $pSB_bwctl_master_pid =
> > > > "/var/lib/perfsonar/perfsonarbuoy_ma/bwctl/bwmaster.pid";
> > > > my $pSB_bwctl_master_pname = "bwmaster";
> > > > my $pSB_bwctl_collector_pid =
> > > > "/var/lib/perfsonar/perfsonarbuoy_ma/bwctl/upload/bwcollector.pid";
> > > > my $pSB_bwctl_collector_pname = "bwcollector";
> > >
> > > The check will go into 'not running' if there are no PID files, or the
> > > PID in the file doesn't match the process name that is currently
> > > running. That would be the first place to look (and naturally the old
> > > fashioned 'reboot' has been known to fix this).
> > >
> > > With regards to your Bandwidth test machine - I don't see any tests in
> > > an active state via the results page, so that may be additional
> > > troubleshooting you will want to do.
> > >
> > > Thanks;
> > >
> > > -jason
> > >
> > > On Sep 27, 2013, at 11:38 AM, Soichi Hayashi
> > > <>
> > > wrote:
> > >
> > > > Hello.
> > > >
> > > > I have following perfsonar instances (installed via RPM)
> > > >
> > > > > http://perfsonar-lt.grid.iu.edu/toolkit/
> > > > > http://perfsonar-bw.grid.iu.edu/toolkit/
> > > >
> > > > For -lt, I see following
> > > > perfSONAR-BUOY Regular Testing (One-Way Latency)[1] Not Running
> > > > And on -bw instance, I see following
> > > > perfSONAR-BUOY Regular Testing (Throughput)[1] Not Running
> > > > I am not sure if these message are red-herring, but these services
> > > > are enabled (via UI) and collectors are running, for example for -lt
> > > >
> > > > 2013-09-27 15:36:49 UTC
> > > > [root@perfsonar-lt:/var/log/perfsonar]#
> > > > ps -ef | grep colle
> > > > 497 485 511 3 15:01 ? 00:01:24
> > > > /opt/perfsonar_ps/perfsonarbuoy_ma/bin/powcollector.pl:handle_req[129.79.53.52]
> > > > 497 511 1 0 01:06 ? 00:00:00
> > > > /opt/perfsonar_ps/perfsonarbuoy_ma/bin/powcollector.pl:master
> > > >
> > > > How can I troubleshoot this issue?
> > > > Soichi




Archive powered by MHonArc 2.6.16.

Top of Page