Connecting to host port 5864
[ 16] local port 48756 connected to port 5864
[ ID] Interval Transfer Bitrate
Retr Cwnd
[ 16] 0.00-1.00 sec 2.61 MBytes
0.02 Gbits/sec 0 444 KBytes
[ 16]
1.00-2.00 sec 37.7 MBytes 0.32 Gbits/sec 0 5.01 MBytes
[ 16] 2.00-3.00 sec 112 MBytes 0.94 Gbits/sec 0
7.86 MBytes
[ 16] 3.00-4.00 sec 87.5
MBytes 0.73 Gbits/sec 32 4.07 MBytes
[
16] 4.00-5.00 sec 63.8 MBytes 0.53 Gbits/sec 0 4.10 MBytes
[ 16] 5.00-6.00 sec 66.2 MBytes 0.56
Gbits/sec 0 4.24 MBytes
[ 16] 6.00-7.00
sec 68.8 MBytes 0.58 Gbits/sec 0 4.54 MBytes
[ 16] 7.00-8.00 sec 73.8 MBytes 0.62 Gbits/sec 0
4.98 MBytes
[ 16] 8.00-9.00 sec 81.2
MBytes 0.68 Gbits/sec 0 5.56 MBytes
[
16] 9.00-10.00 sec 93.8 MBytes 0.79 Gbits/sec 0 6.31 MBytes
[ 16] 10.00-11.00 sec 105 MBytes 0.88
Gbits/sec 0 7.21 MBytes
[ 16]
11.00-12.00 sec 111 MBytes 0.93 Gbits/sec 0 7.97 MBytes
[ 16] 12.00-13.00 sec 75.0 MBytes 0.63 Gbits/sec 32
4.06 MBytes
[ 16] 13.00-14.00 sec 63.8
MBytes 0.53 Gbits/sec 0 4.11 MBytes
[
16] 14.00-15.00 sec 66.2 MBytes 0.56 Gbits/sec 0 4.29 MBytes
[ 16] 15.00-16.00 sec 70.0 MBytes 0.59
Gbits/sec 0 4.62 MBytes
[ 16]
16.00-17.00 sec 75.0 MBytes 0.63 Gbits/sec 0 5.09 MBytes
[ 16] 17.00-18.00 sec 85.0 MBytes 0.71 Gbits/sec 0
5.72 MBytes
[ 16] 18.00-19.00 sec 95.0
MBytes 0.80 Gbits/sec 0 6.51 MBytes
[
16] 19.00-20.00 sec 108 MBytes 0.90 Gbits/sec 0 7.44 MBytes
[ 16] 20.00-21.00 sec 111 MBytes 0.93
Gbits/sec 32 6.42 MBytes
[ 16]
21.00-22.00 sec 101 MBytes 0.85 Gbits/sec 0 6.43 MBytes
[ 16] 22.00-23.00 sec 101 MBytes 0.85 Gbits/sec 0
6.46 MBytes
[ 16] 23.00-24.00 sec 101
MBytes 0.85 Gbits/sec 0 6.55 MBytes
[
16] 24.00-25.00 sec 104 MBytes 0.87 Gbits/sec 0 6.70 MBytes
[ 16] 25.00-26.00 sec 106 MBytes 0.89
Gbits/sec 0 6.92 MBytes
[ 16]
26.00-27.00 sec 110 MBytes 0.92 Gbits/sec 0 7.21 MBytes
[ 16] 27.00-28.00 sec 111 MBytes 0.93 Gbits/sec 0
7.55 MBytes
[ 16] 28.00-29.00 sec 111
MBytes 0.93 Gbits/sec 0 7.94 MBytes
[
16] 29.00-30.00 sec 111 MBytes 0.93 Gbits/sec 32 6.50 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 16] 0.00-30.00 sec 2.55 GBytes 0.73 Gbits/sec 128
sender
[ 16] 0.00-30.00 sec 2.54 GBytes
0.73 Gbits/sec receiver
The retransmits are suspiciously consistent
in their timing. Maybe that is more support for your switch buffer
theory? I also should note that a 30 second 1G test from Sunnyvale to
Chicago shows no retransmits, and UDP testing between Monterey Bay and
Sunnyvale show no loss. Is retransmits with no UDP loss indicative of
packets being received out of order? Thanks,
Jared Schlemmer
Network Engineer,
GlobalNOC at Indiana University
On Jul
31, 2017, at 3:01 PM, Jason Zurawski <> wrote:
Hey Jared;
I would run a longer test to stuff far away (30s) just so you get an
understanding of macro behavior. I would be tempted to say that what
you see below is buffer induced (~8M or greater) packet loss, but I
don't know enough about the capabilities of the path
(which model of MX, and I have no insight into what is inside of a
3930s) or have any guesses as to where it may be caused. The fact that
it runs clean up to that point is a good sign, and probably points to
the shorter latency retransmissions as being more
related to overzealous sending by someone - and not some symptom of a
problem (as long as you are sure the tests to SUNN and CHIC traverse the
same infrastructure outbound).
Your MTU observation may be worth looking at though ... what are the MTU
settings of servers and switches/routers that you can see? 1500 or
9000 and does that expectation match reality? It also goes without
saying (but I am saying it) that if there are copper
conenctions involved, verify the duplex settings. E.g. just because it
is supposed to auto negotiate, doesn't mean it will do that.
Thanks;
-jason
Jared Schlemmer wrote:
Thanks for the quick responses - the low latency leading to modest buffer requirements makes a lot of sense. I’ll try to answer everyone’s questions below:
- Both perf hosts are directly connected to the routers in Sunnyvale and Monterey Bay by 1GE connections.
- The path that I have visibility into is Monterey PERFSONAR <—> Juniper MX router <—> AT&T Ciena 3930 switch <—> AT&T “cloud” <—> AT&T Ciena 3930 switch <—> Juniper MX router <—> Sunnyvale PERFSONAR. We manage both perf boxes and both Juniper routers. We connect directly to the Ciena switches, which cohabitate the same rack as our routers, but are managed by AT&T.
- No errors are on the interfaces at either location, although we do see MTU output errors slowly incrementing on the Monterey interface facing Sunnyvale. I point this out although I think it’s unrelated - it’s incrementing very slowly, and I just ran a couple tests out of Monterey and the output MTU errors didn’t increment at all. I suspect this is some kind of broadcast traffic or something else related to these hosts being connected via VPLS cloud.
Here is a 1G test from Monterey to Chicago:
Connecting to host port 5840
[ 16] local port 37714 connected to port 5840
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 16] 0.00-1.00 sec 2.36 MBytes 0.02 Gbits/sec 0 386 KBytes
[ 16] 1.00-2.00 sec 28.6 MBytes 0.24 Gbits/sec 0 3.64 MBytes
[ 16] 2.00-3.00 sec 101 MBytes 0.85 Gbits/sec 0 7.62 MBytes
[ 16] 3.00-4.00 sec 106 MBytes 0.89 Gbits/sec 32 4.07 MBytes
[ 16] 4.00-5.00 sec 63.8 MBytes 0.53 Gbits/sec 0 4.09 MBytes
[ 16] 5.00-6.00 sec 65.0 MBytes 0.55 Gbits/sec 0 4.18 MBytes
[ 16] 6.00-7.00 sec 67.5 MBytes 0.57 Gbits/sec 0 4.42 MBytes
[ 16] 7.00-8.00 sec 72.5 MBytes 0.61 Gbits/sec 0 4.81 MBytes
[ 16] 8.00-9.00 sec 80.0 MBytes 0.67 Gbits/sec 0 5.34 MBytes
[ 16] 9.00-10.00 sec 88.8 MBytes 0.74 Gbits/sec 0 6.03 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 16] 0.00-10.00 sec 676 MBytes 0.57 Gbits/sec 32 sender
[ 16] 0.00-10.00 sec 666 MBytes 0.56 Gbits/sec receiver
I checked interfaces along this path and bandwidth contention should not be contributing. This test looks like what you would expect - throughput ramping up as window increases, then a period of tries and the window backing back down. Thanks again,
Jared Schlemmer
Network Engineer, GlobalNOC at Indiana University
On Jul 31, 2017, at 1:53 PM, Matthew J Zekauskas wrote:
Some thoughts...
I wonder if you could also characterize what you see as "good"?
I would posit that Monterey to Sunnyvale is relatively short, so the
latency is relatively low, and TCP can recover relatively quickly, and
maintain throughput in the face of modest loss. ~500K may well be
sufficient buffer to keep this path filled.
Are the endpoints 1GE connected? (so they would not be likely to overrun
the connection in the middle).
Could it be that there is existing traffic so you are congesting in one
direction but not the other?
Do you see any other indications of loss - errors or drops on interfaces?
When you ask about "real world impact" -- are you talking about the
tests themselves which will saturate the path and could adversely affect
user performance, or the presence of some loss, which might affect user
performance elsewhere, depending on the application and distance from
the user?
--Matt
On 7/31/17 1:40 PM, Jared Schlemmer wrote:
We just turned up a new network endpoint that connects to an existing aggregation site via a 1gb AT&T VPLS connection and I’m seeing some interesting performance results. The sites are Monterey Bay and Sunnyvale, CA. Tests from Sunnyvale to Monterey Bay are good, but the reverse direction, Monterey Bay toward Sunnyvale, I see this:
Connecting to host port 5332
[ 16] local port 58534 connected to port 5332
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 16] 0.00-1.00 sec 110 MBytes 0.92 Gbits/sec 0 1.16 MBytes
[ 16] 1.00-2.00 sec 113 MBytes 0.95 Gbits/sec 64 553 KBytes
[ 16] 2.00-3.00 sec 111 MBytes 0.93 Gbits/sec 32 498 KBytes
[ 16] 3.00-4.00 sec 112 MBytes 0.94 Gbits/sec 32 434 KBytes
[ 16] 4.00-5.00 sec 112 MBytes 0.94 Gbits/sec 32 362 KBytes
[ 16] 5.00-6.00 sec 112 MBytes 0.94 Gbits/sec 0 669 KBytes
[ 16] 6.00-7.00 sec 112 MBytes 0.94 Gbits/sec 32 622 KBytes
[ 16] 7.00-8.00 sec 111 MBytes 0.93 Gbits/sec 32 574 KBytes
[ 16] 8.00-9.00 sec 112 MBytes 0.94 Gbits/sec 32 519 KBytes
[ 16] 9.00-10.00 sec 112 MBytes 0.94 Gbits/sec 32 458 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 16] 0.00-10.00 sec 1.09 GBytes 0.94 Gbits/sec 288 sender
[ 16] 0.00-10.00 sec 1.09 GBytes 0.93 Gbits/sec receiver
My questions are, a) how is it that we see retries and such a small window size and yet still get near line-rate throughput, and b) what is the real world impact of a test like this? Users at the Monterey site are reporting wildly varying performance out to the internet.
There are likely a lot of factors going on here, but I wanted to focus just on the testing between these two sites through the AT&T cloud. Any insights, theories or suggestions would be much appreciated. Thanks,
Jared Schlemmer
Network Engineer, GlobalNOC at Indiana University