Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user]

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user]


Chronological Thread 
  • From: Shawn McKee <>
  • To:
  • Cc: "" <>
  • Subject: Re: [perfsonar-user]
  • Date: Wed, 20 May 2015 18:27:52 -0400

Hi Winnie,

Forgot to respond to this:

"In /var/lib/perfsonar/regular_testing are LOTS of bwctl_* directories.
Sorting by time shows lots timestamped 24 April** & then 28 April. No 27
April.
** & earlier back to Mar 29th, should delete them.... no, wait, shouldn't
the  software delete them itself?!"

Yes they should have gotten deleted by the software but we saw issues (especially in v3.4.1) where these would pile up.

Typically 'du -hs /var/lib/perfsonar/regular_testing' should show a few MB to tens of MB of use.   If you see lots of old files you should clean them out.

To clean up directory/files older than 5 days try:

find /var/lib/perfsonar/regular_testing/ -mtime +5 -prune -exec rm -rf {} \;

(You can test by replacing '-exec rm -rf {} \;' with -ls to see what would be removed)

Shawn

On Tue, May 19, 2015 at 4:40 AM, Winnie Lacesso <> wrote:
Greetings Perfsonar Gurus!

A few q.

I believe it is true that most sites (probably) have 2 boxes, 1 for Lat &
1 for BW, but some sites use one for both. Is that correct?

Looking at the serviceTest/psGraph.cgi for some boxes, eg our BW box, some
of the entries clearly say lat/latency eg
        ps-latency.scinet.utoronto.ca (142.150.19.61)
        perfsonar-ps-latency.pic.es (193.109.172.189)

Click on either our BW box on the line paired with the remote (lat) box,
or the remote (lat) box => a window pops up naming both our BW box & the
remote (presumably) lat box, but no data.
Now on the same page are either clearly the same site's BW box
        perfsonar-ps-latency.pic.es (193.109.172.189)
or what might be (a guess) the site's BW box (guess: "psb" = BW)
        psb01.pic.es (193.109.172.187)

So am a bit curious why there might be BW tests from our BW box to a
site's *pair* of perfsonar boxes, that is to both their BW & their Lat
box.... like is that normal or a config error? (especially since there is
no data in the graphs between our BW box & their Lat box).

Next, something happened early hours of 27 April. To quite a number of
sites, our BW box appears to have been very fine till early hours of 27
April. Then abruptly nothing.

A .7 bet of the cause is there was an overnight yum auto-update
Apr 27 04:06:46 Updated: perl-perfSONAR_PS-RegularTesting-3.4.2-4.pSPS.noarch

(Note: our Lat box updated at same time & seems fine)

It was noticed IIRC early May that very few graphs showed up which click
on other hosts on BW's graphs page:

http://lcgnetmon02.phy.bris.ac.uk/serviceTest/psGraph.cgi

All the "we are talking to these hosts" show, but for most, click on it ->
it cycles a bit -> then box shows with no data (or red dots = "Errors").

Our BW box was rebooted 6 May in hopes that a fresh restart of everything
=>> bw box graphs (with data) again. But no.

If eg choose t2ps-bandwidth.physics.ox.ac.uk (not too far from Bristol)
or perfsonar-bw.tier2.hep.manchester.ac.uk, there is a graph, but only
red-dot Error; hit "1m" => pretty clear it was ok till 27 April & stopped
abruptly in the early hours, to both ox & mancs.

The same is true if one goes to that remote box eg
http://t2ps-bandwidth.physics.ox.ac.uk/serviceTest/psGraph.cgi
(I can't ATM get to mancs) & clicks on lcgnetmon02.phy.bris.ac.uk - all
fine till wee hours of 27 April.

But on other boxen, eg hepsonar1.ph.liv.ac.uk no graph shows at all &
clicking "Previous 1w" does nothing (eg the dates don't change to show the
previous week). Same if choose eg "1m" - it not show anything.
(What does it mean, that the dates don't change - are those graphs
"broken"?)

To a *few* boxes out there, in one direction at least all seems well:
psum02.aglt2.org, ps-bandwidth.scinet.utoronto.ca, perfsonar.na.infn.it
But going to that remote box's page & click on lcgnetmon02.phy.bris = NO
DATA

So our Bw box IS ok, for a few hosts, in at least one direction

So am unsure if the BW box lcgnetmon02 has so to speak "lost its (or
some of its) data" for some connections, or just can't graph it or what.
Or does this sound like a firewall issue? (I can't see how, but.....)

I took a look in the regular_testing.log coverine 27 April in the early
hours but nothing stands out as "obviously, it went pear-shaped here";
it's full of WARN & ERROR both before 27 April & after.
The owamp_bwctl.log covering the early hours of 27 April shows nothing
different.

In /var/lib/perfsonar/regular_testing are LOTS of bwctl_* directories.
Sorting by time shows lots timestamped 24 April** & then 28 April. No 27
April.
** & earlier back to Mar 29th, should delete them.... no, wait, shouldn't
the  software delete them itself?!

Can anyone advise how to debug what happened to our bw box's graphs/data
please? Or, is this all Normal?
(I'm not Looking for Work, our site's not been ticketed so, guess: it's
not completely broken....)


Winnie Lacesso / Bristol University Particle Physics Computing Systems
HH Wills Physics Laboratory, Tyndall Avenue, Bristol, BS8 1TL, UK




Archive powered by MHonArc 2.6.16.

Top of Page