Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] Strange performance results - AT&T VPLS circuit

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] Strange performance results - AT&T VPLS circuit


Chronological Thread 
  • From: Jason Zurawski <>
  • To: Jared Schlemmer <>
  • Cc: "" <>
  • Subject: Re: [perfsonar-user] Strange performance results - AT&T VPLS circuit
  • Date: Mon, 31 Jul 2017 15:01:33 -0400
  • Ironport-phdr: 9a23:jBFnLhezQn79PfFagh7gLE4KlGMj4u6mDksu8pMizoh2WeGdxcS9Zh7h7PlgxGXEQZ/co6odzbGH4+a4ASQp2tWoiDg6aptCVhsI2409vjcLJ4q7M3D9N+PgdCcgHc5PBxdP9nC/NlVJSo6lPwWB6nK94iQPFRrhKAF7Ovr6GpLIj8Swyuu+54Dfbx9GiTe5Yr5+Ngm6oRnMvcQKnIVuLbo8xAHUqXVSYeRWwm1oJVOXnxni48q74YBu/SdNtf8/7sBMSar1cbg2QrxeFzQmLns65Nb3uhnZTAuA/WUTX2MLmRdVGQfF7RX6XpDssivms+d2xSeXMdHqQb0yRD+v9LlgRgP2hygbNj456GDXhdJ2jKJHuxKquhhzz5fJbI2JKPZye6XQds4YS2VcRMZcTy9PDI28YYsVAeUOMvhWoZTzp1QMsRSzHhOjCP/1xTBSmnP7x6833uI8Gg/GxgwgGNcOvWzQotrvL6cSVua1x7TLwjXedfNZwzn86JPLchAgvPqBWrNxcdfLyUY1GQLFlVaQqY3+MjyLzeQBqW6b4PR8Ve+2jWMstgJ/oiC3y8sxlIXEhZgZx17e+Sh23Yo5P9+1RFNjbdK5DJddtDuWOoVrTs84Xm1lujg2xqcbtZKneCUHzoksyQTFZPydaYeI5wruVOaPLjd8g3JoYLy/iAi9/ES6zu3zTc203ExFripCjNnArnEN1xrN5cibUvZx41mt1DWV2w3d8O1JLl04mbDZJpI82rIwk4AcsUXHHi/4gkX2i6qWe10l++i18eToeLvnpoSfN491kQzxLL8ulta5AesmLggCR3Kb9vik1L3/4U35R61HjuUonanDvpDaPsMbpqijDA9Py4oj9g2/ACm80NkDmXkHLUlFeA6cj4T3IV3OIfb4DeuhjFS2ljdk2ezGMqP7DprTM3fDjeSpQbEoyUdGxQZ79ttf459RD7wbaKbxV1T6tNjVCzc4NQC1yuDuTthxy9VNY2+XBr6lN/aYqVKS6PkoJeCWIZIOtSzVKv456uTogGNj31IRYOPhiYMacn6jGfJvOQCEenf2qtYHDWoQuAciFqrnhEDUFXZxbmy/U+oH7TE/BYejAJ2LEoKgm7eB2Cq+NpxQamFPDFnKHHv1IdaqQfAJPQCbOchn2g4ZT7G+UIwgnUW8qRXz1KFkL8LZ4WseuI61h4s93PHaiRxnrW88NM+ayWzYCjgsxm4=

Hey Jared;



I would run a longer test to stuff far away (30s) just so you get an understanding of macro behavior.  I would be tempted to say that what you see below is buffer induced (~8M or greater) packet loss, but I don't know enough about the capabilities of the path (which model of MX, and I have no insight into what is inside of a 3930s) or have any guesses as to where it may be caused.  The fact that it runs clean up to that point is a good sign, and probably points to the shorter latency retransmissions as being more related to overzealous sending by someone - and not some symptom of a problem (as long as you are sure the tests to SUNN and CHIC traverse the same infrastructure outbound). 

Your MTU observation may be worth looking at though ... what are the MTU settings of servers and switches/routers that you can see?  1500 or 9000 and does that expectation match reality?  It also goes without saying (but I am saying it) that if there are copper conenctions involved, verify the duplex settings. E.g. just because it is supposed to auto negotiate, doesn't mean it will do that. 

Thanks;

-jason

Jared Schlemmer wrote:
Thanks for the quick responses - the low latency leading to modest buffer requirements makes a lot of sense. I’ll try to answer everyone’s questions below:

- Both perf hosts are directly connected to the routers in Sunnyvale and Monterey Bay by 1GE connections. 
- The path that I have visibility into is Monterey PERFSONAR <—> Juniper MX router <—> AT&T Ciena 3930 switch <—> AT&T “cloud” <—> AT&T Ciena 3930 switch <—> Juniper MX router <—> Sunnyvale PERFSONAR. We manage both perf boxes and both Juniper routers. We connect directly to the Ciena switches, which cohabitate the same rack as our routers, but are managed by AT&T.
- No errors are on the interfaces at either location, although we do see MTU output errors slowly incrementing on the Monterey interface facing Sunnyvale. I point this out although I think it’s unrelated - it’s incrementing very slowly, and I just ran a couple tests out of Monterey and the output MTU errors didn’t increment at all. I suspect this is some kind of broadcast traffic or something else related to these hosts being connected via VPLS cloud.

Here is a 1G test from Monterey to Chicago:

Connecting to host  port 5840
[ 16] local port 37714 connected to port 5840
[ ID] Interval               Transfer         Bitrate               Retr  Cwnd
[ 16]   0.00-1.00   sec  2.36 MBytes  0.02 Gbits/sec    0    386 KBytes       
[ 16]   1.00-2.00   sec  28.6 MBytes  0.24 Gbits/sec    0   3.64 MBytes       
[ 16]   2.00-3.00   sec   101 MBytes  0.85 Gbits/sec    0   7.62 MBytes       
[ 16]   3.00-4.00   sec   106 MBytes  0.89 Gbits/sec   32   4.07 MBytes       
[ 16]   4.00-5.00   sec  63.8 MBytes  0.53 Gbits/sec    0   4.09 MBytes       
[ 16]   5.00-6.00   sec  65.0 MBytes  0.55 Gbits/sec    0   4.18 MBytes       
[ 16]   6.00-7.00   sec  67.5 MBytes  0.57 Gbits/sec    0   4.42 MBytes       
[ 16]   7.00-8.00   sec  72.5 MBytes  0.61 Gbits/sec    0   4.81 MBytes       
[ 16]   8.00-9.00   sec  80.0 MBytes  0.67 Gbits/sec    0   5.34 MBytes       
[ 16]   9.00-10.00  sec  88.8 MBytes  0.74 Gbits/sec    0   6.03 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[ 16]   0.00-10.00  sec   676 MBytes  0.57 Gbits/sec   32             sender
[ 16]   0.00-10.00  sec   666 MBytes  0.56 Gbits/sec                  receiver

I checked interfaces along this path and bandwidth contention should not be contributing. This test looks like what you would expect - throughput ramping up as window increases, then a period of tries and the window backing back down. Thanks again,

Jared Schlemmer
Network Engineer, GlobalNOC at Indiana University




On Jul 31, 2017, at 1:53 PM, Matthew J Zekauskas  wrote:

Some thoughts...

I wonder if you could also characterize what you see as "good"?

I would posit that Monterey to Sunnyvale is relatively short, so the 
latency is relatively low, and TCP can recover relatively quickly, and 
maintain throughput in the face of modest loss.  ~500K may well be 
sufficient buffer to keep this path filled.

Are the endpoints 1GE connected? (so they would not be likely to overrun 
the connection in the middle).
Could it be that there is existing traffic so you are congesting in one 
direction but not the other?
Do you see any other indications of loss - errors or drops on interfaces?

When you ask about "real world impact" -- are you talking about the 
tests themselves which will saturate the path and could adversely affect 
user performance, or the presence of some loss, which might affect user 
performance elsewhere, depending on the application and distance from 
the user?

--Matt


On 7/31/17 1:40 PM, Jared Schlemmer wrote:
We just turned up a new network endpoint that connects to an existing aggregation site via a 1gb AT&T VPLS connection and I’m seeing some interesting performance results. The sites are Monterey Bay and Sunnyvale, CA. Tests from Sunnyvale to Monterey Bay are good, but the reverse direction, Monterey Bay toward Sunnyvale, I see this:

Connecting to host port 5332
[ 16] local port 58534 connected to port 5332
[ ID] Interval                 Transfer        Bitrate              Retr  Cwnd
[ 16]   0.00-1.00   sec   110 MBytes  0.92 Gbits/sec    0   1.16 MBytes
[ 16]   1.00-2.00   sec   113 MBytes  0.95 Gbits/sec   64    553 KBytes
[ 16]   2.00-3.00   sec   111 MBytes  0.93 Gbits/sec   32    498 KBytes
[ 16]   3.00-4.00   sec   112 MBytes  0.94 Gbits/sec   32    434 KBytes
[ 16]   4.00-5.00   sec   112 MBytes  0.94 Gbits/sec   32    362 KBytes
[ 16]   5.00-6.00   sec   112 MBytes  0.94 Gbits/sec    0    669 KBytes
[ 16]   6.00-7.00   sec   112 MBytes  0.94 Gbits/sec   32    622 KBytes
[ 16]   7.00-8.00   sec   111 MBytes  0.93 Gbits/sec   32    574 KBytes
[ 16]   8.00-9.00   sec   112 MBytes  0.94 Gbits/sec   32    519 KBytes
[ 16]   9.00-10.00  sec   112 MBytes  0.94 Gbits/sec   32    458 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[ 16]   0.00-10.00  sec  1.09 GBytes  0.94 Gbits/sec  288             sender
[ 16]   0.00-10.00  sec  1.09 GBytes  0.93 Gbits/sec                  receiver

My questions are, a) how is it that we see retries and such a small window size and yet still get near line-rate throughput, and b) what is the real world impact of a test like this? Users at the Monterey site are reporting wildly varying performance out to the internet.

There are likely a lot of factors going on here, but I wanted to focus just on the testing between these two sites through the AT&T cloud. Any insights, theories or suggestions would be much appreciated. Thanks,


Jared Schlemmer
Network Engineer, GlobalNOC at Indiana University









Archive powered by MHonArc 2.6.19.

Top of Page