Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] maddash question

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] maddash question


Chronological Thread 
  • From: Andrew Lake <>
  • To: Brian Candler <>
  • Cc: "" <>
  • Subject: Re: [perfsonar-user] maddash question
  • Date: Tue, 8 Sep 2015 12:26:21 -0400

Hi brian,

Looks like you figured out some of this on your own, but adding some more details inline. 

On Tue, Sep 8, 2015 at 1:07 AM, Brian Candler <> wrote:
I am trying to understand some things about maddash configuration.

(1) If you set up a full mesh, then you see something like this:

   A B C

A  . 1 X
B  2 . X
C  X X .

Each intersection box shows two values - for example if you hover over box (1) you see results for A->B and B->A. The box is also split into two colours for the two results.

However at box (2) you also see results for A->B and B->A, *and these may be different*. The grid is not symmetrical.

The only thing I can think of is that one box is showing test results stored in the measurement archive on host A, and the other is showing the test results stored in the measurement archive on host B. Is that correct? How does one know which MA each box is showing?

The MeshConfig software generates grids as follows generally speaking:

- For a full mesh the top of a box is the results from the row -> column as reported by the row MA. The bottom is the results from row to column as reported by the column MA. Why it does this relates to your next question so I'll wait to answer it below.

- For a disjoint test the top is the result from row -> column as reported by the row MA and the bottom is  result from column -> row as reported by the column MA. The reverse direction is in the same box since by definition a the column host may not be in the row host list, so it's our one shot to capture the reverse result. 

Which MA is which gets even more confusing if you have hosts in your mesh with no_agent (meaning no MA). You can tell the MA being used by clicking on the "Statistics" vertical tab and reading the maURL variable. 

 

(2) Does this also imply that if you set up a full mesh, all the tests will be done twice? For example, host A schedules and stores tests A->B and B->A, whilst host B also schedules and stores tests from A->B and B->A?

Isn't that wasteful, or is it an intentional form of redundancy to make sure a test is being done somewhere?

Most meshes for whatever reason use the "force_bidirectional" option in their meshes. This means both hosts run tests in both directions and yields redundant results. In the case groups like ESnet and WLCG this redundancy is intentional so if one MA goes down you still have the results in another.  As meshes continue to grow though I suspect this option might start get to be used less since it effectively doubles the stress on the host. Since it is common though the MeshConfig software generates a MaDDash config that tries to capture these redundant results and talks to both MA (and can go orange if one goes down). 
 

(3) If you look at the sample grid at
http://ps-dashboard.es.net/index.cgi?grid=ESnet%20-%20ESnet%20to%20ESnet%20Packet%20Loss%20Testing
it is 30x30 but not quite symmetrical:

- albq-owamp.es.net appears on the left but not the top
- sdsc-owamp.es.net appears on the top but not the left

Is this actually a disjoint mesh rather than a full mesh? If so, why would it be configured that way? 

(4) Looking at the same example I also note that some tests are visible at the intersection (A,B) but not (B,A). In the second direction the square is white. It is not orange ("Unable to retrieve data") or grey ("Check has not yet run")

e.g.
aofa (left) slac (top) shows values green and red. Clicking on this shows some stored results (although only a single packet loss figure, not the two figures implied by the green and red)

slac (left) aofa (top) is white. Clicking on this does nothing.

I cannot work out at all what's going on here at all.

Many thanks for any clues.

Answering #3 and #4 together. This is a special configuration called an "ordered mesh". It will be changing in the future. For a long time, ESnet has had a combination of newer hardware and old hardware. The old hardware did not handle running a large mesh + archive very well. As a result, we have this weird setup where newer hosts with good hardware would initiate and store the mosts tests and the older hardware would run less tests. We also have a third group for testers that do both bwctl and owamp out the same interface. The result is a full mesh with a single result in each direction but who initiates/stores the result is dependent on which group the host falls into and sorted alphabetically within each group (i.e. insanely confusing). We are in the middle of a network-wide upgrade of all our measurement hosts, when that happens, this whacky setup will likely go away. 

 

Cheers,

Brian Candler.




Archive powered by MHonArc 2.6.16.

Top of Page