Skip to Content.
Sympa Menu

perfsonar-user - Re: [perf-node-users] [perfsonar-user] Regular Testing services showing "Not Running"

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perf-node-users] [perfsonar-user] Regular Testing services showing "Not Running"


Chronological Thread 
  • From: Jason Zurawski <>
  • To: Soichi Hayashi <>
  • Cc: , Performance Node Users <>
  • Subject: Re: [perf-node-users] [perfsonar-user] Regular Testing services showing "Not Running"
  • Date: Mon, 30 Sep 2013 14:10:13 -0400

Hi Soichi;

As you have custom build your host and it is outside of our direct support
structure, I am not sure what additional feedback I can give you toward a
solution until you are able to follow the steps we already discussed. The
purpose of following the steps in
http://psps.perfsonar.net/toolkit/FAQs.html#Q54 is to ensure that you
complete several tasks that would lead to proper operation. I fully
understand that it doesn't fit your exact problem, but it will help to narrow
down what could be wrong, and this has to be how we move forward. Please try
to follow them before we move forward on any other suggestions for the
bandwidth node.

With regards to your latency node not delivering data to the dashboard
application, I would suggest moving that discussion to the dashboard
development mailing list. It appears that it is functioning correctly from a
configuration point of view, which is good, but API issues are more specific
to that effort.

Thanks;

-jason

On Sep 30, 2013, at 1:58 PM, Soichi Hayashi
<>
wrote:

> OK. I haven't had a chance to go through FAQ #54, but here is the state of
> things as of now..
>
> perfsonar-lt.grid.iu.edu
> 1. The "Traceroute Regular Testing" is showing "Running". I guess something
> got cleared during the weekend.
> 2. MyOSG's full matrix view is still showing all "NA" for reverse
> direction... even though one-way latency tests are all active with
> bidirectional set to yes.
>
> perfsonar-bw.grid.iu.edu
> 3. Still don't see any active test under Throughput service / active tests.
> 4. Consequently, MyOSG's full matrix view is showing "NA" for all except
> reverse direction for perfsonar02.discovery.wisc.edu
>
> By the way, I have disabled local iptable on both perfsonar-lt.grid.iu.edu
> and perfsonar-bw.grid.iu.edu.
>
> I am not sure what to troubleshoot next (other than the FAQ54.. which I
> don't quite see any relevance to the issue I am having to be honest)..
>
> Soichi
>
>
>
> On Fri, Sep 27, 2013 at 5:38 PM, Jason Zurawski
> <>
> wrote:
> Hi Soichi;
>
> Answers inline:
>
> On Sep 27, 2013, at 4:59 PM, Soichi Hayashi
> <>
> wrote:
>
> > I believe bwmaster processes are already running
> >
> > 2013-09-27 20:37:29 UTC
> > [root@perfsonar-bw:/opt/perfsonar_ps/traceroute_ma/etc]#
> > ps -ef | grep buoy
> > 497 2257 1 0 19:32 ? 00:00:00
> > /opt/perfsonar_ps/perfsonarbuoy_ma/bin/bwcollector.pl:master
> > 497 2264 1 99 19:32 ? 01:04:15
> > /opt/perfsonar_ps/perfsonarbuoy_ma/bin/bwmaster.pl:master
> >
> > But, I've restarted anyway.
> >
> > 2013-09-27 20:39:39 UTC
> > [root@perfsonar-bw:/opt/perfsonar_ps/traceroute_ma/etc]#
> > /etc/init.d/perfsonarbuoy_bw_master start
> > perfSONAR-BUOY BWCTL Measurement Service Started
> >
> > > That may give a clue if the they don't start/stop cleanly.
> > So no clue here?
> >
> > By the way, the init script said "perfSONAR-BUOY BWCTL Measurement
> > Service Stopped", but I couldn't start it back again, due to
> > "bwmaster.pl:24644 still running...".. I had to do kill -9 on the master,
> > then start again. Something wrong with the init script?
>
> This is what I meant about zombies being stuck around - you may want to
> follow all of the steps in that FAQ entry to be completely sure nothing is
> stuck.
>
> > Anway, I still see nothing under "Active" throughput tests after the
> > restart. I've also rebooted the machine, with no change.
>
> You won't see instantaneous results - depending on the cadence of your
> BWCTL tests it may take hours.
>
> > > Have you configured regular traceroute tests? If you haven't, then it
> > > will show as not running.
> > I see traceroute tests listed in OSG mesh config
> > (http://myosg.grid.iu.edu/pfmesh/json) for perfsonar-lt, and both
> > perfsonar-bw and -lt insances uses this config in mesh agent config. Do
> > you mean to I need to do something else to get this configured?
>
> Are both servers participating in the traceroute testing, or just one?
> Whichever hosts are supposed to be doing the tests, please send the latest
> Traceroute logs from /var/log/perfsonar, someone can take a look to see if
> we see any errors.
>
> > > There is a longer write up here on lots of other fun steps:
> > http://psps.perfsonar.net/toolkit/FAQs.html#Q54
> > > Q:My (OWAMP|BWCTL) measurements have stopped, and I notice mysql errors
> > > in the logs. What should I do?
> >
> > So you want me to go through this mysql troubleshooting? Or.. do you mean
> > to read through all FAQs? By the way, my MySQL DB *did* crash in the
> > past, and I have followed similar steps to recover (actually I think I've
> > re built from scratch at least once since this happened).
>
> Do all of the steps in #54 only, you do not need every single FAQ item:
>
> http://psps.perfsonar.net/toolkit/FAQs.html#Q54
>
> Thanks;
>
> -jason
>
> > Soichi
> >
> > On Fri, Sep 27, 2013 at 4:00 PM, Jason Zurawski
> > <>
> > wrote:
> > Hi Soichi;
> >
> > To answer your questions:
> >
> > >> 1) When I go to throughput service page >
> > >> https://perfsonar-bw.grid.iu.edu/serviceTest/index.cgi?eventType=bwctl
> > >> I see no entry under "Active Tests" for perfsonar-bw. Do you know how
> > >> to enable all inactive tests?
> >
> > Are the bwcollector an bwmaster processes running, if not start them (or
> > perhaps just restart them completely):
> >
> > sudo /etc/init.d/perfsonarbuoy_bw_collector restart
> > sudo /etc/init.d/perfsonarbuoy_bw_master restart
> >
> > That may give a clue if the they don't start/stop cleanly. Sometimes
> > iperf zombies stick around, and you need to go kill them (a likely
> > problem). There is a longer write up here on lots of other fun steps:
> >
> > http://psps.perfsonar.net/toolkit/FAQs.html#Q54
> >
> > >> 2) For both -bw and -lt instances, I still see "Traceroute Regular
> > >> Testing" showing "Not Running". Do I need to worry?
> >
> > Have you configured regular traceroute tests? If you haven't, then it
> > will show as not running. If you have, you may need to follow a similar
> > step to link I sent above (just for the traceroute tools).
> >
> > Thanks;
> >
> > -jason
> >
> > On Sep 27, 2013, at 3:45 PM, Soichi Hayashi
> > <>
> > wrote:
> >
> > > I see.. I did following.
> > >
> > > 2013-09-27 19:31:07 UTC
> > > [root@perfsonar-bw:/var/lib]#
> > > ls -la perfsonar
> > > lrwxrwxrwx 1 root root 20 Sep 27 19:31 perfsonar -> /usr/local/perfsonar
> > > (on both perfsonar-bw and perfsonar-lt)
> > >
> > > Reverted config change, rebooted them, and now I seeing "Running" next
> > > to perfsonar BUOY regular testing.
> > >
> > > I still have 2 issues, however,
> > >
> > > 1) When I go to throughput service page >
> > > https://perfsonar-bw.grid.iu.edu/serviceTest/index.cgi?eventType=bwctl
> > > I see no entry under "Active Tests" for perfsonar-bw. Do you know how
> > > to enable all inactive tests?
> > >
> > > 2) For both -bw and -lt instances, I still see "Traceroute Regular
> > > Testing" showing "Not Running". Do I need to worry?
> > >
> > > Soichi
> > >
> > >
> > > On Fri, Sep 27, 2013 at 3:27 PM, Jason Zurawski
> > > <>
> > > wrote:
> > > Hi Soichi;
> > >
> > > Correct, if you changed the defaults the web page is unlikely to work.
> > > There are two options:
> > >
> > > - Edit the web page to point to the new locations, on a regular pS
> > > Performance Toolkit it is located here (this may be different for your
> > > custom setup): /opt/perfsonar_ps/toolkit/web/root/gui/services/index.cgi
> > >
> > > - Symlink the /var locations to the /usr/local locations that you are
> > > using.
> > >
> > > Thanks;
> > >
> > > -jason
> > >
> > > On Sep 27, 2013, at 3:18 PM, Soichi Hayashi
> > > <>
> > > wrote:
> > >
> > > > Jason,
> > > >
> > > > I have following in the /opt/perfsonar_ps/perfsonarbuoy_ma/etc
> > > > (perfsonar-bw)
> > > > > BWDataDir /usr/local/perfsonar/perfsonarbuoy_ma/bwctl
> > > >
> > > > (perfsonar-lt)
> > > > > OWPDataDir /usr/local/perfsonar/perfsonarbuoy_ma/owamp
> > > >
> > > > These re-configuration were needed to prevent perfsonar from running
> > > > out of disk on the default /var partition.
> > > >
> > > > The config creates pid file in following location (for perfsonar-bw)
> > > > > /usr/local/perfsonar/perfsonarbuoy_ma/bwctl/bwmaster.pid
> > > > which contains a correct PID
> > > >
> > > > 2013-09-27 19:11:47 UTC
> > > > [root@perfsonar-bw:/usr/local/perfsonar/perfsonarbuoy_ma/bwctl]#
> > > > cat bwmaster.pid
> > > > 1850
> > > > 2013-09-27 19:11:50 UTC
> > > > [root@perfsonar-bw:/usr/local/perfsonar/perfsonarbuoy_ma/bwctl]#
> > > > ps - grep 1850
> > > > 497 1850 1 99 17:44 ? 01:27:31
> > > > /opt/perfsonar_ps/perfsonarbuoy_ma/bin/bwmaster.pl:master
> > > >
> > > > I am guessing that.. web interface is using the default pid location
> > > > to look for the PID file, and incorrectly determining that the
> > > > process are not running?
> > > >
> > > > Thanks!
> > > > Soichi
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Sep 27, 2013 at 12:09 PM, Jason Zurawski
> > > > <>
> > > > wrote:
> > > > Hi Soichi;
> > > >
> > > > The local services page is designed to look for PID files and process
> > > > names for the various services. In the case of Latency testing:
> > > >
> > > > > my $pSB_owamp_master_pid =
> > > > > "/var/lib/perfsonar/perfsonarbuoy_ma/owamp/powmaster.pid";
> > > > > my $pSB_owamp_master_pname = "powmaster";
> > > > > my $pSB_owamp_collector_pid =
> > > > > "/var/lib/perfsonar/perfsonarbuoy_ma/owamp/upload/powcollector.pid";
> > > > > my $pSB_owamp_collector_pname = "powcollector";
> > > >
> > > > And in the case of Bandwidth testing:
> > > >
> > > > > my $pSB_bwctl_master_pid =
> > > > > "/var/lib/perfsonar/perfsonarbuoy_ma/bwctl/bwmaster.pid";
> > > > > my $pSB_bwctl_master_pname = "bwmaster";
> > > > > my $pSB_bwctl_collector_pid =
> > > > > "/var/lib/perfsonar/perfsonarbuoy_ma/bwctl/upload/bwcollector.pid";
> > > > > my $pSB_bwctl_collector_pname = "bwcollector";
> > > >
> > > > The check will go into 'not running' if there are no PID files, or
> > > > the PID in the file doesn't match the process name that is currently
> > > > running. That would be the first place to look (and naturally the
> > > > old fashioned 'reboot' has been known to fix this).
> > > >
> > > > With regards to your Bandwidth test machine - I don't see any tests
> > > > in an active state via the results page, so that may be additional
> > > > troubleshooting you will want to do.
> > > >
> > > > Thanks;
> > > >
> > > > -jason
> > > >
> > > > On Sep 27, 2013, at 11:38 AM, Soichi Hayashi
> > > > <>
> > > > wrote:
> > > >
> > > > > Hello.
> > > > >
> > > > > I have following perfsonar instances (installed via RPM)
> > > > >
> > > > > > http://perfsonar-lt.grid.iu.edu/toolkit/
> > > > > > http://perfsonar-bw.grid.iu.edu/toolkit/
> > > > >
> > > > > For -lt, I see following
> > > > > perfSONAR-BUOY Regular Testing (One-Way Latency)[1] Not Running
> > > > > And on -bw instance, I see following
> > > > > perfSONAR-BUOY Regular Testing (Throughput)[1] Not Running
> > > > > I am not sure if these message are red-herring, but these services
> > > > > are enabled (via UI) and collectors are running, for example for -lt
> > > > >
> > > > > 2013-09-27 15:36:49 UTC
> > > > > [root@perfsonar-lt:/var/log/perfsonar]#
> > > > > ps -ef | grep colle
> > > > > 497 485 511 3 15:01 ? 00:01:24
> > > > > /opt/perfsonar_ps/perfsonarbuoy_ma/bin/powcollector.pl:handle_req[129.79.53.52]
> > > > > 497 511 1 0 01:06 ? 00:00:00
> > > > > /opt/perfsonar_ps/perfsonarbuoy_ma/bin/powcollector.pl:master
> > > > >
> > > > > How can I troubleshoot this issue?
> > > > > Soichi



Archive powered by MHonArc 2.6.16.

Top of Page