Skip to Content.
Sympa Menu

perfsonar-dev - Re: [pS-dev] Re: Topics for the next meetign in Berkeley

Subject: perfsonar development work

List archive

Re: [pS-dev] Re: Topics for the next meetign in Berkeley


Chronological Thread 
  • From: Jochen Reinwand <>
  • To: "Jeff W. Boote" <>
  • Cc: Nicolas Simar <>, Peter Holleczek <>, "Eric L. Boyd" <>, Szymon Trocha <>, Loukik Kudarimoti <>, Matthias Hamm <>, Mark Yampolskiy <>, "" <>, Klaus Ullmann <>
  • Subject: Re: [pS-dev] Re: Topics for the next meetign in Berkeley
  • Date: Tue, 14 Aug 2007 14:43:10 +0200
  • Organization: DFN Verein

On Tuesday 14 August 2007 03:14, Jeff W. Boote wrote:
> P.S. I'm going to go into some specific details here regarding the
> Hades/AMI-OWAMP solutions for one-way delay. For those of you more
> interested in the big-picture than the details, you might want to tune out
> now. :)

Thanks for providing such a detailed view on the issue!

For the general audience I would like to add that I totally agree with nearly
everything Jeff outlined.

Regarding the owamp/Hades integration Jeff is absolutely right. I have some
more thoughts on this issue, just to clarify things: Two different interfaces
would be the best solution for running the two systems (Hades and AMI) on the
same host. Using only one interface, on-demand (via owamp) would not be a big
problem. But if Hades prevents owamp tests while Hades measurements are
running, AMI measurements would be more or less impossible. If both
measurements are running concurrently, at least one of the measurement will
exhibit quality degradation. I'm not sure which one, because we are using the
realtime capabilities of the operating system ;-)
Hades utilises POSIX real time capabilities of the operating system, so
on one hand it is more sensitive to slightly degraded performance, but
might preempt services that do not run with elevated priority.
But using two interfaces also brings in operating system related issues.
Perhaps Hades measurements are affected by AMI measurements and/or vice
versa, although they are running on different interfaces. We have no safe
knowledge about this issue and it is most likely both hardware and operating
system dependent.
There is perhaps another issue, because owamp (as bwctl/iperf) can only use
the default routes with the lowest metrics afaik. Because of this we are
using bwctl on the default interface of the GEANT machines and Hades on the
secondary interface. If both bwctl and owamp do not provide a means to
override the regular routing tables, measurements for those two systems will
have to be carefully scheduled in order to not overlap. Alternatively, there
might have to be separate measurement boxes for BWCTL and OWAMP, while Hades
could run concurrently to BWCTL or OWAMP measurements within the required
accuracy of the LHC project (in the case of running parallel with OWAMP maybe
even better).

We have a software distribution available for the "client stuff" (software
running on measurement points) for Fedora. But, as Jeff mentioned, for Hades
we need this one(!) central server. Installing this server and/or creating
some sort of distribution so that others are able to install it, is far more
complex. Also the clients must be configured correctly so that they can be
used. This includes configuring the interface(s), opening the firewall,
enabling ssh access and so on. This is, of course, nothing a software package
should do. But I believe Jeff has more or less the same problems building a
software distribution for AMI ;-)

Regarding the frequency of testing, this is a quite freely configurable
parameter of Hades. In the LHC scenario with a strong hierarchy of tiers, it
is possible to go away from the fully meshed paradigm and e.g. set up packet
trains every second on each measurement path from tier 0 to tier 1. This will
give a much higher frequency of measurements.

Regarding the utility factor of OWD and IPDV, it should be pointed out that
IPDV is for the most part a function of OWD. Over-simplifying it, we could
consider IPDV to be the derivative of OWD. So having a highly precise OWD
measurement eliminates many false positives which otherwise have to be
eliminated by applying statistical methods. It is our strong belief that
those methods like percentiles would eliminate visible phenomena which might
otherwise not be visible because they can not be distinguished from e.g.
colliding measurement packets. Such colliding packets exhibit basically the
same properties as queueing events and the resulting IPDV.

Jochen

> Generally, I agree with what Jochen said. Integration of the two systems
> does not make a whole lot of sense. They are really set up with very
> different assumptions and goals.
>
> To be honest - I believe the easiest solution for LHC is to include both.
> It would not be too difficult to include two interfaces on the host and run
> both AMI/owamp and Hades. (The AMI code is already part of the LHC bundle
> to support the bwctl regularly scheduled tests anyway. And, owamp is almost
> certainly the easiest solution for the on-demand tests. And, if it is
> running on a different interface - I suspect Hades would be much happier.)
>
> Hades and AMI/owamp really do give you different data. And, I don't see how
> you could run them both on the same interface unless Hades was more
> forgiving of other traffic. I saw that Jochen thought they might be
> forgiving of *some* additional traffic for on-demand, but I'm fairly
> certain they would not be as forgiving of the amount of traffic that
> AMI/owamp generally produces.
>
> AMI/owamp runs continuous streams of 10 packets per second between all
> senders to all receivers (using an exponential distribution). There is no
> real 'schedule' because it is continuous. Basically I don't see a
> reasonable way to come up with a 'schedule' to accommodate both testing
> methodologies on a single interface.
>
> I am going to attempt to outline the differences I see in the
> methodologies. I absolutely admit that I am biased here, and I invite
> Roland, Jochen, Stephen or anyone else from DFN to provide the alternate
> view. I'm also going to say from the beginning that I don't think either
> methodology is incorrect, or generally better than the other. It is more a
> question of what pathologies you are attempting to see and what assumptions
> you make about how much centralized vs. local control.
>
> AMI/owamp was tuned for trying to see IPDV issues. We are not as concerned
> with the actual 'real' delay between hosts. AMI/owamp does that, but the
> way we collect the data is more tuned toward looking for changes at as fine
> of a time resolution as we can possibly do it without hitting
> false-positives for IPDV. This is why we test continuously. (For example,
> routing-flaps are often very short lived events. Especially if they are on
> lower-layers - say a sonet reroute. These are the kinds of events we are
> looking for.)
>
> Additionally, we are looking for congestion (queuing events). In this, I
> believe Hades is probably more accurate (when it is actually testing).
> Since they send a 'train' of packets, they will get a more accurate view of
> the distribution caused by queuing. However, because they send those trains
> fairly infrequently - they are unlikely to see many of the queuing events
> that AMI/owamp will see (although, with not as great of precision).
>
> For AMI/owamp to give you good data (IPDV as I indicate above) - what you
> really need is a stable clock. It does not actually need to be extremely
> precise. It needs to be reasonably precise because you want to bound the
> drift between systems. (NTP to 4 nearby peers is typically sufficient.)
>
> AMI/owamp has a bit more of a distributed model with regard to deciding
> what test peers to run with, and where to send the data. There is no global
> scheduling required, and multiple meshes can co-exist with each other. For
> example, each Tier-1 could additionally run an AMI/owamp mesh with several
> Tier-2 centers without interfering with the Tier-0/Tier-1 mesh. And, no
> coordination would need to take place.
>
> I believe Hades is superior at seeing the actual real delay between points
> on the network. The boxes are very well tuned for this. Part of the way
> Hades does this is to tightly control the schedule of when packets will be
> sent/received at a particular host to ensure there is no contention.
> Additionally, Hades is almost certainly more precise with regard to IPDV
> during the packet-train.
>
> The price for this precision is the granularity of the test between any two
> test peers. You won't see the events that happen in-between tests. The
> other price for this precision is the need for scheduling at a global
> level. On the other hand, if you want to have a single operations entity
> looking at the data with NOC alarms and such. I suspect Hades is more
> mature in that area. (I don't know, but it sounds like it.)
>
> As far as distributing the code, I believe Hades and AMI are in similar
> shape. Previously, I have used AMI to run measurements on Abilene and I
> have helped a few other domains install/run the code. I do not yet have a
> real distribution available. (Although, I have promised to do that in
> September...)

--
Jochen Reinwand, DFN-Labor
Friedrich-Alexander-Universität Erlangen-Nürnberg
Regionales RechenZentrum Erlangen (RRZE)
Martensstraße 1, 91058 Erlangen, Germany
Tel. +49 9131 85-28689, -28800, Fax +49 9131 302941

www.win-labor.dfn.de



Archive powered by MHonArc 2.6.16.

Top of Page