Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] OWAMP measurements

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] OWAMP measurements


Chronological Thread 
  • From: Pedro Queirós <>
  • To: Aaron Brown <>
  • Cc: "" <>
  • Subject: Re: [perfsonar-user] OWAMP measurements
  • Date: Wed, 13 Mar 2013 16:27:14 +0000
  • Authentication-results: sfpop-ironport01.merit.edu; dkim=neutral (message not signed) header.i=none

Hi Aaron, 

thank you for your insight. I'd appreciate a little more help on this subject.
I'm going to go down the path of reading the SQL DB directly, and, as such, 
I was reading the schema and trying to figure out the best way to do this.

My requirements are as follows: every 5 minutes query the database and 
extract the minimum, maximum and mean values of the delay and jitter in 
the last 5 minutes. Lost, duplicate and out of order packets would also
be nice.

I have two machines running perfSonar and doing OWAMP measurements
between them. I have set up the tests as follows: packet rate of 1 packet per 
second and packet size of 1500 bytes (is it possible to send more than 1 
packet per second? according to the GUI, it isn't).

With the current schema, I can get from the DATA table: minimum and 
maximum delay, lost and duplicate packets. How can I get the other values? 
I'm finding curious the fact that the mean value of delay isn't stored on the 
DATA table. Why is that?

I don't know if it's because of the way I've set up the tests, but I can only see
a granularity of a minute (each line in the DATA tabel has a difference between 
start and end timestamps of ~1 minute and, as it should, packets sent states the 
correct value of 60). Is it possible to have a finer granularity?

Another doubt of mine concerns the following phrase: "The maxerr has the 
maximum NTP error seen." Can you please elaborate on this? 
What error are you referring to? The error in time synch between the machine
and it's time reference?

Thank you, once again, for taking your time in analyzing these questions.

Kind Regards,
Pedro


On Wed, Mar 6, 2013 at 7:42 PM, Aaron Brown <> wrote:
Hi Pedro,

On 3/5/13 6:41 AM, "Pedro Queirós" <> wrote:

>Hello, sorry for reviving an old thread, but I wanted to give an update
>on this.
>
>
>We've updated the hardware on the lower class servers to use SSD disks and
>1Gb RAM. Unfortunately, this didn't eliminate the spikes in the max
>values we're
>having using OWAMP measurements.

Maximum values can often times be flukes based on all sorts of things
going on in the host itself. I wouldn't put too much stock in them.

>Next step is manually collecting the OWAMP values stored in the database
>and
>do the 95th percentile. I'm guessing the best approach is through PHP /
>MySQL,
>but I saw somewhere that people also use XML to collect the values from
>the MA?
>As anyone done this before, and can provide their insights on this matter
>(both the
>collecting method and the 95th percentile calculation)?

You can see some examples of the XML schema used for querying the
measurement archive at
http://anonsvn.internet2.edu/svn/perfSONAR-PS/trunk/perfSONAR_PS-perfSONARB
UOY/etc/requests/SetupDataRequest-owamp-6.xml


If you do go down the path of reading the SQL DB directly, there's a
document describing the schema at
http://anonsvn.internet2.edu/svn/perfSONAR-PS/trunk/perfSONAR_PS-perfSONARB
UOY/doc/owamp_database_schema.txt


The easiest way I've found to do 95th percentile calculation is to take
each of the buckets, and using something like perl's
Statistics::Descriptive, put in values for each of the packet's delays
(e.g. If the 50ms bucket has 30 packets in it, put 50 into there 30
times), and then have it calculate the 95th (or whatever) percentile.

Cheers,
Aaron

>
>
>Thanks in advance!
>
>
>Pedro
>
>
>On Tue, Dec 11, 2012 at 4:27 PM, Pedro Queirós
><> wrote:
>
>Thank you Gerry and Jim for your replies.
>
>
>I monitor the machines closely, using munin to graph the CPU load
>and network load also. The CPU load is pretty low, around 0.2~0.4,
>and the network load isn't noticeable.
>
>
>Seems like a good option, to try and add more RAM and see if there is
>a change in the results.
>
>
>Kind Regards,
>Pedro
>
>
>On Tue, Dec 11, 2012 at 4:01 PM, Jim Warner
><> wrote:
>
>Make sure you know all the traffic that is raining down on your servers.
>If, for example, you've turned on ssh and exposed it to the public
>internet, you can get the attention of the brute force password guessers
>and they can spike your CPU use.
>
>
>On Tue, Dec 11, 2012 at 6:39 AM, Pedro Queirós
><> wrote:
>
>Hello Aaron,
>
>
>thank you for your reply.
>No, the machines are running only the latest perfSonar-toolkit version,
>installed using
>netinstall. I've disabled all non-essential services, as I'm only
>interested in using OWAMP
>measurements.
>One of the machines is more recent and is quite more powerful: Intel(R)
>Xeon(R) CPU E5310
>running at 1.60 GHz with 8 cores and 4Gb RAM.
>The other machine is part of the proprietary solution that we're trying
>to replace, and we'd like
>to use that hardware, as to avoid new costs in purchasing newer hardware.
>These older machines use a AMD Sempron 2300+ processor, running at 1.6
>Ghz and
>512 MBytes RAM. We're also using a standard IDE disk, although we've
>purchased a IDE
>SSD disk recently, but I have yet to install it on the machine and test
>it out.
>
>
>Should I try using a more powerful machine in place of this older one and
>see how it impacts
>the measurements?
>
>
>Kind Regards,
>Pedro
>
>
>On Tue, Dec 11, 2012 at 1:45 PM, Aaron Brown
><> wrote:
>
>Hey Pedro,
>
>On Dec 10, 2012, at 2:56 PM, Pedro Queirós <>
>wrote:
>
>
>Hello Joe,
>
>
>thank you for your answer. So what you're saying is that
>basically this can happen or not, depending on HW/OS
>combinations, which are not fully understood?
>
>
>I was hoping to replace the proprietary system in use, by using
>perfSonar toolkit, but these findings may not allow it.
>Anyway, I'd appreciate more insight on this issue, such as examples
>of solutions currently in use (I mean hardware and operating systems)
>that do not produce these artifacts.
>
>
>
>
>
>I guess it depends on what you're using the maximum value for. For these
>kinds of measurements, we've found that doing the 95th percentile gives
>an accurate display of jitter/latency seen between hosts without being
>affected by transient host-based issues (generally
> affected by scheduling or I/O).
>
>
>Beyond that, the maximum values that you're seeing seem oddly high (e.g.
>200ms). Do you have anything else running on that host? Especially
>something that does I/O.
>
>
>Cheers,
>Aaron
>
>
>
>
>Kind Regards,
>Pedro Queirós
>
>
>
>
>On Mon, Dec 10, 2012 at 7:35 PM, Joe Metzger
><> wrote:
>
>Pedro,
>I have spent a lot of time on this issue in the past.
>
>There are a number of hardware based solutions that go to great
>lengths to make very accurate NIC to NIC measurements. Most of these
>include carefully engineered strategies to prevent anomalous
>measurements from being taken, and/or filter them out when they happen.
>
>The OWAMP software used in most PerfSONAR implementations takes a
>statistical approach to capture application to application latency.
>It uses general purpose hardware, and general purpose operating systems
>and so the measurements include both OS & network anomalies. It doesn't
>discard obvious anomalies caused by measurement artifacts like the
>other systems I have seen.
>
>There are advantages and disadvantages of each approach, but I think
>it is important to start from a position where you understand these
>differences.
>
>All of the big anomalies that I have tracked down were due to OS
>issues.  It is fairly common to see packets spending 100ms or more
>waiting for interrupts to be serviced. The frequency of packets
>taking this slow path through the kernel grow significantly as
>the load on the boxes grow.  Changing OS (Linux vs FreeBSD), as
>well as OS version changes also impact this, but I haven't seen a
>solution that eliminates it.
>
>--Joe
>
>
>
>
>
>
>On Dec 10, 2012, at 11:00 AM, Pedro Queirós wrote:
>
>> Hi there,
>>
>> we're using OWAMP to measure the link between two sites we have,
>> using two dedicated machines.
>> In another two machines we have a proprietary solution to do the same
>> task.
>>
>> We're finding that the average results are nearly identical, but when
>> using OWAMP, the max delay that's show in the graph has several
>> spikes that are not displayed when using the proprietary solution. In
>> fact, the proprietary solution only shows max delays values close to
>> the average, that is, around 3~4 ms.
>>
>> I was wondering if anyone has had the chance to run OWAMP side by
>> side with other similar tools and compare the results. We can't find any
>> reason why those spikes appear in the graphs in OWAMP.
>> They also appear more frequently in IPv4 measurements vs IPv6
>> measurements.
>>
>> I'll attach screenshots so you can hopefully understand better what I
>> mean. As you can see, in IPv6, the spikes are much more rare.
>>
>> I'm hoping to find a reason for these spikes, because nothing else
>> (CPU load, network congestion) justifies these spikes in the
>> measurements - and they don't appear in the measurements we have
>> in place using a proprietary solution.
>>
>>
>> Kind Regards,
>> Pedro Queirós
>
>
>>
>><ipv4_backward_max.png><ipv4_forward_max.png><ipv4_min.png><ipv6_backward
>>_max.png><ipv6_forward_max.png><ipv6_min.png>
>
>Joe Metzger
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>






Archive powered by MHonArc 2.6.16.

Top of Page