Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] Problem with BW host (I think), & LAT (different dates)

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] Problem with BW host (I think), & LAT (different dates)


Chronological Thread 
  • From: Shawn McKee <>
  • To: "" <>
  • Cc: "" <>, Marian Babik <>, "''" <>
  • Subject: Re: [perfsonar-user] Problem with BW host (I think), & LAT (different dates)
  • Date: Tue, 15 Sep 2015 07:50:53 -0400

HI Winnie,

Thanks for the report.    In many cases the UK sites have chosen to deploy a single host rather than the recommended two perfSONAR instances.  If the same box is registered as both a bandwidth and a latency host, we have re-configured the WLCG (and UK meshes) to only run latency tests.  This is because the bandwidth tests running on the SAME NIC can cause problems with the latency tests.   So yes, this is a configuration choice by the mesh admins.

The fix would be to either add a new host for bandwidth tests or to reconfigure the existing host to use a different NIC on the same physical box for running the bandwidth tests.  This dual service host configuration has been supported since perfSONAR 3.4 came out and the perfSONAR documentation specifies how to setup this configuration properly.   In effect you configure two different IP addresses on different NICs on your single host and register one as the latency host and one as the bandwidth host.

Let Marian, Duncan and I know if you have follow-on questions.  Thanks,

Shawn


On Tue, Sep 15, 2015 at 4:51 AM, Winnie Lacesso <> wrote:
Dear Wise Gurus,

Although both Bristol perfsonar hosts are ok in the monitoring, I
think there are problems.

BW borken Sun 23 Aug?
---------------------
In early May 2015 Andrew Lake pointed out

> - I was also able to reach your web page which allowed me to look at the
> mesh config file you are using at
> https://myosg.grid.iu.edu/pfmesh/mine/hostname/lcgnetmon02.phy.bris.ac.uk
>
> The way this is configured, you are only running BWCTL throughput tests
> to hosts in the UK. You then have a very large mesh of traceroute tests
> you are running. That means for a whole bunch of those hosts listed on
> the graph page I wouldn't expect any throughput data. My guess is that
> this is expected, but just pointing it out in case it's not.

A guess is whoever in UK is the Perfsonar Expert set it up like that for
UK sites. Fine. To me this says in looking at the graph pages, only look
for data btw us & other UK sites. Fine.

http://lcgnetmon02/serviceTest/psGraph.cgi

Click on a UK site on the left eg hepsonar1.ph.liv.ac.uk (138.253.60.81)
& it shows no graph; scroll down shows 1 tiny spike Friday (11 Sep) noon.
Click on 1m however & it shows the reverse throughput data stopped looks
like Sun 23 Aug; same date the throughput data becomes a straight line to
about 11 Sept.
For t2ps-bandwidth.physics.ox.ac.uk, similar but the straight
throughput line carries right on to now & there are quite a few red-dot
errors too.

For lcgps02.gridpp.rl.ac.uk, the dates are same (something happened 23
Aug) but this time it's the reverse throughput that carries on in a
straight line, then dips (I think this is bcs this time our BW box is the
dest not source). Similar for pygrid-sonar1.lancs.ac.uk &
heplnx130.pp.rl.ac.uk

Can anyone advise debug+fix?

For many other UK sites though, like bham & cam, mancs, dur.scotgrid,
there's no graph at all, changing the default 1w to 1m shows zero. It's a
puzzle there's no data, but no one's saying "your BW box is borken".
Maybe the config does not include tests btw our BW & those hosts. I thought
it used to......

Lat borken Thu 10 Sep just before noon?
--------------------------------------
There seems to be LAT data for most worldwide hosts checked (not all), but for
all UK hosts (only!) something happened Thu 10 Sep just before noon - one of
the graph lines abruptly stops. Not true for non-UK hosts. Does this indicate
a problem with lcgnetmon.phy.bris.ac.uk & if so how to debug+fix?

Grateful for advice!




Archive powered by MHonArc 2.6.16.

Top of Page