Skip to Content.
Sympa Menu

ndt-users - help diagnosing transfer stalls

Subject: ndt-users list created

List archive

help diagnosing transfer stalls


Chronological Thread 
  • From: Young Hyun <>
  • To:
  • Subject: help diagnosing transfer stalls
  • Date: Fri, 11 Sep 2009 17:56:26 -0700

I'm trying to diagnose a difficult connectivity problem and was hoping more experienced users of NDT might shed some light. The main problem I see is SCP transfers stalling regularly until the transfer rate drops to a crawl (to a few Kb/s). As far as we can tell, there is no auto-negotiation/duplex problems on either end.

I'm running the command line NDT client v3.5.0 on a Debian Linux box (2.6.18-6 kernel) located in Argentina. This box seems to have general connectivity problems, whereas the other end point located in San Diego has good connectivity (around 80 Mb/s to NDT servers). According to NDT, the bandwidth of the Argentina box is highly asymmetric: from 685 kb/s to 3 Mb/s in the direction NDT server to client and 37 kb/s to 68 kb/s in the opposite direction (depending on the NDT server).

I have no idea why there is such a dramatic asymmetry in the bandwidth. The traceroute path between the Argentina and San Diego boxes is very symmetric in both directions (please see paths below). I've checked the path MTU in both directions, and the MTU is at least 1500 bytes in both directions (so no excessive hidden fragmentation in one direction). And according to ping, the packet loss rate is only 2 packets out of 119 packets (1.7%), which should be acceptable. So everything seems to check out, but the problem persists, and has persisted for more than a year (I'm working with the remote site and the RedClara transit network to resolve the issue).

I include below a summary of the NDT runs from the Argentina box (named eze-ar) to several NDT servers. I also include the traceroute paths taken at the same time. The ndt.sdsc.edu server is located at our site, and the name of the box used for testing at SDSC is nibbler. I also include the full output of NDT for one of the runs in case my summary is missing some crucial information.

By the way, I've run NDT from other boxes located worldwide that cross nearly the same path within Internet2 before reaching the target NDT server, and they show that Internet2 is unlikely to be the culprit.

--Young


= = = = ========================================================================
eze-ar and netspeed.stanford.edu:
----------------------------------------------------------------------------

NDT shows 1.11 Mb/s client-to-server (C2S) and 41.69 kb/s (S2C)

Information: Other network traffic is congesting the link
Information [S2C]: Packet queuing detected: 75.04% (local buffers)

Web100 reports the Round trip time = 279.76 msec;the Packet size = 1448 Bytes; and
There were 22 packets retransmitted, 17 duplicate acks received, and 20 SACK blocks received
Packets arrived out-of-order 41.46% of the time.
The connection stalled 6 times due to packet loss.
The connection was idle 3.76 seconds (34.18%) of the time.
This connection is sender limited 15.47% of the time.
Increasing the current send buffer (256.00 KB) will improve performance
This connection is network limited 84.53% of the time.
Excessive packet loss is impacting your performance, check the auto- negotiate function on your local PC and network switch

traceroute to ndt2.stanford.edu (171.66.6.38), 30 hops max, 40 byte packets
1 157.92.44.62 (157.92.44.62) 0.447 ms
2 157.92.47.2 (157.92.47.2) 1.044 ms
3 rnoc8-ruba1-34M.BUENOS-AIRES.retina.ar (199.248.144.245) 3.246 ms
4 retina-ar-bsas.core.redclara.net (200.0.204.145) 2.616 ms
5 buenosaires-saopaulo.core.redclara.net (200.0.204.38) 29.269 ms
6 198.32.11.105 (198.32.11.105) 198.721 ms
7 64.57.28.6 (64.57.28.6) 213.719 ms
8 so-3-2-0.0.rtr.hous.net.internet2.edu (64.57.28.43) 240.424 ms
9 so-3-0-0.0.rtr.losa.net.internet2.edu (64.57.28.44) 285.857 ms
10 hpr-lax-hpr--i2-newnet.cenic.net (137.164.26.132) 267.818 ms
11 svl-hpr--lax-hpr-10ge.cenic.net (137.164.25.13) 276.451 ms
12 oak-hpr--svl-hpr-10ge.cenic.net (137.164.25.9) 276.401 ms
13 hpr-stan-ge--oak-hpr.cenic.net (137.164.27.158) 278.123 ms
14 bbra-rtr.Stanford.EDU (171.64.1.134) 401.651 ms
15 ndt2.Stanford.EDU (171.66.6.38) 277.252 ms


= = = = ========================================================================
eze-ar and ndt.switch.ch
----------------------------------------------------------------------------

NDT shows 685.00 kb/s client-to-server (C2S) and 43.21 kb/s (S2C)

Information: Other network traffic is congesting the link
Information [S2C]: Packet queuing detected: 99.57% (local buffers)

Web100 reports the Round trip time = 278.55 msec;the Packet size = 1448 Bytes; and
There were 15 packets retransmitted, 16 duplicate acks received, and 22 SACK blocks received
Packets arrived out-of-order 43.24% of the time.
The connection stalled 2 times due to packet loss.
The connection was idle 1.02 seconds (9.27%) of the time.
This connection is network limited 99.79% of the time.
Excessive packet loss is impacting your performance, check the auto- negotiate function on your local PC and network switch

traceroute to atitlan.switch.ch (130.59.31.2), 30 hops max, 40 byte packets
1 157.92.44.62 (157.92.44.62) 0.458 ms
2 157.92.47.2 (157.92.47.2) 1.085 ms
3 rnoc8-ruba1-34M.BUENOS-AIRES.retina.ar (199.248.144.245) 2.704 ms
4 retina-ar-bsas.core.redclara.net (200.0.204.145) 2.928 ms
5 buenosaires-saopaulo.core.redclara.net (200.0.204.38) 30.001 ms
6 200.0.204.42 (200.0.204.42) 139.444 ms
7 clara.rt1.mad.es.geant2.net (62.40.124.61) 253.282 ms
8 so-7-2-0.rt1.gen.ch.geant2.net (62.40.112.25) 275.382 ms
9 swiCE2-10GE-1-1.switch.ch (62.40.124.22) 274.798 ms
10 atitlan.switch.ch (130.59.31.2) 275.201 ms


= = = = ========================================================================
eze-ar and speedtest.oeutelecom.com
----------------------------------------------------------------------------

NDT shows 3.36 Mb/s client-to-server (C2S) and 67.92 kb/s (S2C)

Information: Other network traffic is congesting the link
Information [S2C]: Packet queuing detected: 99.65% (local buffers)

Web100 reports the Round trip time = 168.88 msec;the Packet size = 1448 Bytes; and
There were 23 packets retransmitted, 24 duplicate acks received, and 33 SACK blocks received
Packets arrived out-of-order 37.50% of the time.
The connection stalled 3 times due to packet loss.
The connection was idle 1.12 seconds (10.18%) of the time.
This connection is network limited 99.36% of the time.
Excessive packet loss is impacting your performance, check the auto- negotiate function on your local PC and network switch

traceroute to 216.255.240.2 (216.255.240.2), 30 hops max, 40 byte packets
1 157.92.44.62 (157.92.44.62) 0.804 ms
2 157.92.47.1 (157.92.47.1) 0.808 ms
3 *
4 ngryrt12-lan1.telmex.net.ar (200.80.245.3) 4.720 ms
5 GigabitEthernet2-15.ar3.EZE1.gblx.net (64.215.185.81) 3.154 ms
6 COX-COM-INC.TenGigabitEthernet4-1.ar3.DAL2.gblx.net (64.215.187.2) 224.279 ms
7 68.1.2.140 (68.1.2.140) 190.832 ms
8 24.136.46.70 (24.136.46.70) 195.603 ms
9 wsip-70-169-186-6.ga.at.cox.net (70.169.186.6) 195.854 ms
10 wsip-70-169-187-14.ga.at.cox.net (70.169.187.14) 171.844 ms
11 speedtest.oeutelecom.com (216.255.240.2) 167.835 ms


= = = = ========================================================================
eze-ar and ndt.sdsc.edu
----------------------------------------------------------------------------

NDT shows 450.00 kb/s client-to-server (C2S) and 37.05 kb/s (S2C)

Information: Other network traffic is congesting the link
Information [S2C]: Packet queuing detected: 99.78% (local buffers)

Web100 reports the Round trip time = 275.48 msec;the Packet size = 1448 Bytes; and
There were 12 packets retransmitted, 13 duplicate acks received, and 19 SACK blocks received
Packets arrived out-of-order 39.39% of the time.
The connection stalled 2 times due to packet loss.
The connection was idle 1.01 seconds (9.18%) of the time.
This connection is network limited 99.93% of the time.
Excessive packet loss is impacting your performance, check the auto- negotiate function on your local PC and network switch

(eze-ar) src$ traceroute -I -q 1 ndt.sdsc.edu
traceroute to bwctl1.sdsc.edu (192.12.207.1), 30 hops max, 40 byte packets
1 157.92.44.62 (157.92.44.62) 0.485 ms
2 157.92.47.2 (157.92.47.2) 1.128 ms
3 rnoc8-ruba1-34M.BUENOS-AIRES.retina.ar (199.248.144.245) 2.342 ms
4 retina-ar-bsas.core.redclara.net (200.0.204.145) 2.978 ms
5 buenosaires-saopaulo.core.redclara.net (200.0.204.38) 30.410 ms
6 198.32.11.105 (198.32.11.105) 199.274 ms
7 64.57.28.6 (64.57.28.6) 214.177 ms
8 so-3-2-0.0.rtr.hous.net.internet2.edu (64.57.28.43) 237.722 ms
9 so-3-0-0.0.rtr.losa.net.internet2.edu (64.57.28.44) 465.549 ms
10 hpr-lax-hpr--i2-newnet.cenic.net (137.164.26.132) 270.552 ms
11 riv-hpr--lax-hpr-10ge.cenic.net (137.164.25.5) 271.849 ms
12 hpr-sdsc-sdsc1--riv-hpr-ge.cenic.net (137.164.27.50) 271.895 ms
13 lightning.sdsc.edu (132.249.31.6) 271.889 ms
14 bwctl1.sdsc.edu (192.12.207.1) 275.894 ms

(nibbler) $ traceroute -q 1 eze-ar.ark.caida.uba.ar
traceroute to 157.92.44.18 (157.92.44.18), 64 hops max, 40 byte packets
1 pinot-g1-0-0 (192.172.226.1) 0.773 ms
2 dolphin.sdsc.edu (198.17.46.17) 0.563 ms
3 hpr-lax-hpr--sdsc-10ge.cenic.net (137.164.26.33) 5.688 ms
4 hpr-i2-newnet--lax-hpr.cenic.net (137.164.26.146) 5.470 ms
5 so-0-0-0.0.rtr.hous.net.internet2.edu (64.57.28.45) 200.872 ms
6 64.57.28.42 (64.57.28.42) 60.878 ms
7 64.57.28.7 (64.57.28.7) 74.465 ms
8 198.32.11.106 (198.32.11.106) 242.644 ms
9 saopaulo-buenosaires.core.redclara.net (200.0.204.37) 269.090 ms
10 retina-ar-bsas.peer.redclara.net (200.0.204.146) 270.412 ms
11 ruba1-rnoc8-34M.BUENOS-AIRES.retina.ar (199.248.144.246) 271.078 ms
12 157.92.47.11 (157.92.47.11) 273.176 ms
13 eze-ar.ark.caida.uba.ar (157.92.44.18) 273.548 ms

(eze-ar) src$ ping ndt.sdsc.edu
--- bwctl1.sdsc.edu ping statistics ---
119 packets transmitted, 117 received, 1% packet loss, time 118415ms
rtt min/avg/max/mdev = 271.249/273.276/316.024/4.304 ms


= = = = ========================================================================
Testing network path for configuration and performance problems -- Using IPv4 address
Requesting test suite:
> Middlebox test
> Simple firewall test
> C2S throughput test
> S2C throughput test
WARNING: NDT server has different version number (3.4.4a)
<-- Middlebox test -->
-- port: 3003
Checking for Middleboxes . . . . . . . . . . . . . . . . . . Done
<-------------------->
<-- Simple firewall test -->
checking for firewalls . . . . . . . . . . . . . . . . . . .
-- port: 40206
-- time: 1
-- oport: 1874
Simple firewall test: no connection for 1 seconds
SIGALRM was caught
Unable to create connect socket.
Done
<-------------------------->
<-- C2S throughput test -->
-- port: 3002
running 10s outbound test (client to server) . . . . . 1.11 Mb/s
<------------------------->
<-- S2C throughput test -->
-- port: 3003
running 10s inbound test (server to client) . . . . . . 41.69 kb/s
<------------------------->
The slowest link in the end-to-end path is a 100 Mbps Full duplex Fast Ethernet subnet
Information: Other network traffic is congesting the link
Information: The receive buffer should be 3415 kbytes to maximize throughput
Information [S2C]: Packet queuing detected: 75.04% (local buffers)
Server 'netspeed.stanford.edu' is probably behind a firewall. [Connection to the ephemeral port failed]
Client is probably behind a firewall. [Connection to the ephemeral port failed]

------ Web100 Detailed Analysis ------

Web100 reports the Round trip time = 279.76 msec;the Packet size = 1448 Bytes; and
There were 22 packets retransmitted, 17 duplicate acks received, and 20 SACK blocks received
Packets arrived out-of-order 41.46% of the time.
The connection stalled 6 times due to packet loss.
The connection was idle 3.76 seconds (34.18%) of the time.
This connection is sender limited 15.47% of the time.
Increasing the current send buffer (256.00 KB) will improve performance
This connection is network limited 84.53% of the time.
Excessive packet loss is impacting your performance, check the auto- negotiate function on your local PC and network switch

Web100 reports TCP negotiated the optional Performance Settings to:
RFC 2018 Selective Acknowledgment: ON
RFC 896 Nagle Algorithm: ON
RFC 3168 Explicit Congestion Notification: OFF
RFC 1323 Time Stamping: ON
RFC 1323 Window Scaling: OFF
The theoretical network limit is 0.12 Mbps
The NDT server has a 256 KByte buffer which limits the throughput to 7.15 Mbps
Your PC/Workstation has a 63 KByte buffer which limits the throughput to 1.75 Mbps
The network based flow control limits the throughput to 0.51 Mbps

Client Data reports link is ' 5', Client Acks report link is ' 5'
Server Data reports link is ' 1', Server Acks report link is ' 1'
Packet size is preserved End-to-End
Server IP addresses are preserved End-to-End
Client IP addresses are preserved End-to-End



  • help diagnosing transfer stalls, Young Hyun, 09/11/2009

Archive powered by MHonArc 2.6.16.

Top of Page