perfsonar-user - Re: [perfsonar-user] "Real World" point in time test config?

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] "Real World" point in time test config?

From: Eli Dart <>
To: Pol Llovet <>
Cc: Alan Whinery <>, "" <>
Subject: Re: [perfsonar-user] "Real World" point in time test config?
Date: Fri, 13 Mar 2015 15:02:23 -0700

Hi Pol,

On Fri, Mar 13, 2015 at 9:25 AM, Pol Llovet <> wrote:

That was just a hypothetical example. And the response you gave is what I am trying to mitigate. I want to use PS to give me some actionable information when evaluating the health of a connection to the DMZ (where the primary PS box is).

Right. I find this mode of testing to be useful (i.e. make perfSONAR behave on the wire in the same way that the user traffic does).

This is why I answered your question in the way that I did - a clean perfSONAR test tells you some things, and doesn't tell you others.

One of the most common misconceptions we run into is the use of short-distance testing to troubleshoot long-distance behavior. So, in testing long-distance behavior it is necessary to run a long-distance test (unless you have a tool that can faithfully mimic long-distance behavior over short distances).

Clearly, nobody is going to be using PS to do work, so I'm trying to simulate a spectrum of traffic patterns by using bwctl/owamp (which is why they are so configurable, right?). I can use the resulting information to inform our actions.

Right now I am using a bunch of interlaced bwctl and bwping tests (using owamp in place of ping if possible). The primary option that I am varying is the -DSCP option. I am walking through each of them in hopes that they behave differently (that higher priority VOIP-marked traffic will have better performance than low priority traffic).

Depending on how your network is provisioned, this may or may not get you results that map well to user traffic. Some switches only put certain devices or port ranges into the VOIP queue, and the VOIP queue sometimes doesn't have all that much bandwidth associated with it.

I am also wanting to create a "standard" portable perfsonar setup that I can have a few of and send technicians out into the field with. We can use the standardized data results to compare with other areas on campus at different times to detect anomalous network behavior if they exist without deploying dozens of PS nodes In every network closet/lab.

This is useful. However, I would encourage you to deploy at least a few test hosts at strategic locations, and run background OWAMP tests between them. This gives you a long-running data set that you can go back to when you spot an anomaly (e.g. "wow - we started getting packet loss after the maintenance last Wednesday at 6PM").

I think this tool could be very useful going forward. Currently, when a user reports a problem, the way they rule out the network is by going to speedtest.net. This is, in my opinion, pretty much useless data. Ping does help a bit. But the raw numbers of their data transfer don't really say anything about the network (since it might be the host or target hardware, operating system, etc). I think PS could be a great diagnostic tool for this purpose (and others).

Well - the tests can tell you something about the portion of the path that is common to the test traffic and the user traffic, provided the test traffic is a good model for the user traffic. It's often quite helpful to know whether to involve the systems people, network people, applications people, or security people since they are often different groups (and sometimes different upper management as well).

Eli

Does that make sense?

thanks,

Pol
.

On Wed, Mar 11, 2015 at 8:29 PM Eli Dart <> wrote:

On Wednesday, March 11, 2015, Pol Llovet <> wrote:
So you are saying that if the basic tests show ~960Mbs sustained, then I can rule out the network. There're no other configuration options (that you know of) that I should be selecting that will give me more relevant information for LAN traffic.

I would say that the portion of the path that you tested is unlikely to be a bottleneck for traffic which uses that portion of the path in the manner in which your test used it.

It's probably worth looking at other factors as you say.

What do the end systems look like?

Eli

Thanks!

There is a second issue, but I will put it in a new email for findability. :)

On Wed, Mar 11, 2015 at 3:49 PM Eli Dart <> wrote:
In general, it should be easy to fill a 1G pipe in the LAN, even in the presence of some packet loss. That same packet loss will almost certainly cause serious performance problems if the cause of the loss is in the path of a long-distance (WAN) transfer.

If you're able to fill the network path with a single TCP stream, and your users are seeing terrible performance over that same path, I would next look at systems and storage (e.g. NFS mounts, single disk spindle somewhere, firewall traversal to get to storage, CPU bound server, filesystem metadata limitations, bla bla bla).

Eli

On Wed, Mar 11, 2015 at 2:39 PM, Pol Llovet <> wrote:
To the best of my knowledge. These are primarily LAN tests, with some WAN tests (depending on specific collaboration efforts). The WAN comes with a huge grain of salt for point in time tests. But the LAN should be a lot more consistent.

If someone says that loading GIS files are unreasonably slow, an I want to rule out the LAN without stuffing perfsonar nodes in every rack... I bring my perfsonar laptop over configured for that subnet and run an hour of bwctl and owamp tests against the science dmz perfsonar node. I am somewhat arbitrarily picking the parameters for the tests, and wanted to see if there were group opinions on the matter.

-p

On Wed, Mar 11, 2015 at 3:21 PM Eli Dart <> wrote:
Hi Pol,

Are your tests traversing the same network path as the data transfers you are trying to emulate?

Eli

On Wed, Mar 11, 2015 at 11:58 AM, Pol Llovet <> wrote:
[apologies for the early send]
... Not convinced that default tests are sufficient to rule out the network.

I was thinking that a battery of different configs for the two tools would be able to give me a more complete picture. However, I'm not sure what those config flags should be.

On Wed, Mar 11, 2015 at 12:55 PM Pol Llovet <> wrote:
Indeed, but I do want to isolate the network in the test (in some cases to remove the variable from the problem space). I'm just not sure (or convinced) that vanilla bwctl and iperf
On Wed, Mar 11, 2015 at 12:40 PM Alan Whinery <> wrote:
On 3/11/2015 8:27 AM, Pol Llovet wrote:
> I have a laptop that I am using to do point in time tests of bandwidth and
> latency in various labs around our campus. This is mostly to have an
> "apples to apples" test (or at least the closest thing to it I can get).
>
> However, my bandwidth tests are coming up at close to the theoretical
> maximum for TCP/IP over the switch port speed. This isn't really what the
> users are seeing. Does anyone know of flags I could send to bwctl or owamp
> to more accurately represent different types of traffic (video streams,
> loading a GIS file, rsyncing a giant image or tgz, I/O on lots of small
> files, etc).
>
> Thanks for your help,
>
> Pol Llovet
>

If you want to emulate what users are seeing, you're probably better off
doing what the users are doing, than trying to emulate it with iperf or
nuttcp, etc. I keep a 2 GB file on a well-connected server, with a
fairly fast disk. For a more "real world" test, I tell people to
download that file, which will be subject to more of the pressures and
influences that "real" traffic is subject to.

Or alternately, if you want to match users, have users do iperf or
nuttcp tests. but memory-to-memory throughput tester tests are most
often not going to compare easily with user activity. Iperf/nuttcp show
you what the network can do without the other constraints you
mentioned. Users have slow disks, constrained network stacks, other
challenges.

--
Eli Dart, Network Engineer NOC: (510) 486-7600
ESnet Office of the CTO (AS293) (800) 333-7638
Lawrence Berkeley National Laboratory
PGP Key fingerprint = C970 F8D3 CFDD 8FFF 5486 343A 2D31 4478 5F82 B2B3

--
Eli Dart, Network Engineer NOC: (510) 486-7600
ESnet Office of the CTO (AS293) (800) 333-7638
Lawrence Berkeley National Laboratory
PGP Key fingerprint = C970 F8D3 CFDD 8FFF 5486 343A 2D31 4478 5F82 B2B3

--
Eli Dart, Network Engineer NOC: (510) 486-7600
ESnet Office of the CTO (AS293) (800) 333-7638
Lawrence Berkeley National Laboratory
PGP Key fingerprint = C970 F8D3 CFDD 8FFF 5486 343A 2D31 4478 5F82 B2B3

Eli Dart, Network Engineer NOC: (510) 486-7600

ESnet Office of the CTO (AS293) (800) 333-7638

Lawrence Berkeley National Laboratory

PGP Key fingerprint = C970 F8D3 CFDD 8FFF 5486 343A 2D31 4478 5F82 B2B3

[perfsonar-user] "Real World" point in time test config?, Pol Llovet, 03/11/2015
- Re: [perfsonar-user] "Real World" point in time test config?, Alan Whinery, 03/11/2015
  - Re: [perfsonar-user] "Real World" point in time test config?, Pol Llovet, 03/11/2015
    - Re: [perfsonar-user] "Real World" point in time test config?, Pol Llovet, 03/11/2015
      - Re: [perfsonar-user] "Real World" point in time test config?, Eli Dart, 03/11/2015
        
        Re: [perfsonar-user] "Real World" point in time test config?, Pol Llovet, 03/11/2015
        
        Re: [perfsonar-user] "Real World" point in time test config?, Eli Dart, 03/11/2015
        
        Re: [perfsonar-user] "Real World" point in time test config?, Pol Llovet, 03/12/2015
        Re: [perfsonar-user] "Real World" point in time test config?, Eli Dart, 03/12/2015
        Re: [perfsonar-user] "Real World" point in time test config?, Pol Llovet, 03/13/2015
        Re: [perfsonar-user] "Real World" point in time test config?, Eli Dart, 03/13/2015

List archive

Re: [perfsonar-user] "Real World" point in time test config?