Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] bwping/owamp tests randomly stop and never restart

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] bwping/owamp tests randomly stop and never restart


Chronological Thread 
  • From: Ty Bell <>
  • To: Andy Lake <>
  • Cc: "GarnizovIvan (RRZE)" <>, perfsonar-user <>
  • Subject: Re: [perfsonar-user] bwping/owamp tests randomly stop and never restart
  • Date: Wed, 24 Jun 2015 10:27:00 -0400

Hi Andy,

No I'm not running the toolkit, I've only installed bwctl, owamp,
RegularTesting 3.4.2 and their requirements. I'm using bwctld to do all of
the scheduling for the bwctl and owamp tests so it's using bwping instead of
powstream. No nightly restarts, and across the majority of my 10 hosts the
last full restart of regular testing and bwctld was June 1st. Is a nightly
restart recommended? I periodically check my hosts for any hung owamp
processes and I will find a couple but those never seem to correlate with the
hosts displaying the problem.

--Ty

> On Jun 24, 2015, at 6:48 AM, Andrew Lake
> <>
> wrote:
>
> Hi Ty,
>
> Are you running the toolkit? or doing nightly restarts? and is this on a
> 3.4.2 host?
>
> I’ve been debugging an issue with WLCG the past few weeks where
> sporadically powstream tests will start failing after the remote sides
> owampd restart. It looks like if owampd is restarted at just the correct
> moment it kills the parent but leaves around the children to which the
> powstreams are connected. This causes powstream to sit there and do nothing
> connected to these orphaned processes on the other side until it’s
> restarted (or the remote owampd process is forcibly killed). This is
> similar to what you noted, that if you kill the powstream processes (which
> also happens to end the orphaned process on the remote end),
> regular_testing will spawn a new powstream, and powstream will get a new
> working connection.
>
> I think the fix/workaround is going to be to send a SIGKILL to anything
> that looks like an owampd process after giving it a chance to nicely
> shutdown during the nightly restart.
>
> Ivan’s issue may or may not be the same, since from what I understand it
> was isolated to a single host, and this can happen to any host and appears
> a lot more random.
>
> Thanks,
> Andy
>
>
>
>
>
>
> On Wed, Jun 24, 2015 at 4:17 AM, Garnizov, Ivan (RRZE)
> <>
> wrote:
>
> Hi Ty,
>
> In fact I have reported the same issue about my instances. Issue tracker.
> https://github.com/perfsonar/regular-testing/issues/5
> Suddenly out of no reason, without any notable event in the logs the
> regular_testing service stops collecting the data. I have also noted that a
> single service restart does not help. You have to follow a graceful
> restart....meaning:
> sudo service regular_testing stop
> sudo service postgresql stop
> sudo service cassandra restart
> sudo service postgresql start
> sudo service regular_testing start
>
> This immediately fixes all measurements. I have tested that on 2 hosts.
> We still might be in different scenarios, although my issue is also around
> the latency tests.
>
> Best regards,
> Ivan
>
>
>
>
> -----Original Message-----
> From:
>
>
> [mailto:]
> On Behalf Of Ty Bell
> Sent: Dienstag, 23. Juni 2015 16:41
> To: perfsonar-user
> Subject: Re: [perfsonar-user] bwping/owamp tests randomly stop and never
> restart
>
> All my hosts are running the same (lastest) versions of the tools and
> they're all sync'd with the same NTP sources. Instead of restarting the
> whole regular testing service, I've taken to killing the individual bwping
> process, regular testing fires up a new process and everything clears up.
>
> --Ty
>
> > On Apr 23, 2015, at 3:29 PM, Amit Khare
> > <>
> > wrote:
> >
> > Hi Ty,
> >
> > Are all your hosts running the same version of toolkit. We have had
> > similar issues with one of the older toolkit releases.I would also
> > check if the hosts are properly synced with NTP server(s). Thanks,
> >
> > Amit
> > ----------------------------------------------------------------------
> > -----
> > -
> > Amit Khare | Network Engineer | CANARIE Inc | 45 O'Connor St., Suite
> > 500, Ottawa, ON K1P 1A4 | Office: 613-943-5377│Cell: 613-404-8696│CANARIE
> > NOC:
> > 613-944-5612│www.canarie.ca
> >
> >
> >
> >
> >
> >
> > On 2015-04-23, 15:19, "Ty Bell"
> > <>
> > wrote:
> >
> >> Hi All,
> >>
> >> Wondering if this is something anyone else has observed. I have 10
> >> hosts in a mesh all running owamp tests, and randomly (maybe once a
> >> week) I’ll check on the mesh and see two hosts have stopped testing
> >> in one direction. It’s never the same hosts, and never the same
> >> direction, seems totally random. I can execute tests from the command
> >> line and they run just fine. I’ve looked around for hung owamp
> >> processes or daemon restarts and haven’t found anything.
> >>
> >> The only resolution I’ve found is to restart regular testing on both
> >> hosts.
> >>
> >> Thanks,
> >> --Ty
> >>
>
>
>

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail




Archive powered by MHonArc 2.6.16.

Top of Page