Skip to Content.
Sympa Menu

ndt-users - Re: gigabit slower than fast-e

Subject: ndt-users list created

List archive

Re: gigabit slower than fast-e


Chronological Thread 
  • From: Richard Carlson <>
  • To: Bill Abbott <>, Matthew J Zekauskas <>
  • Cc:
  • Subject: Re: gigabit slower than fast-e
  • Date: Fri, 07 Mar 2008 02:02:13 -0500

Hi Bill;

What servers are you running locally (NDT, NPAD, ...)?

Looking at the NEWY NDT/NPAD server logs I see 4 tests from Rutgers hosts. Two of them are from the pdb-a-linux-3 host. Both of these tests asked if it the local infrastructure (from Rutgers - through MagPi - to the Internet2 PoP in New York) would support a 150 Mbps flow over a 90 msec RTT path. The answer in both cases was no, the loss rate was too high to support this flow.

Looking at the loss statistics the measured rate was .000343% but you needed a loss rate x10 lower (.000036%). The question is, where is this loss occuring.

The traceroute from the NDT server to the Rutgers host is:
[rcarlson@nms-rexp
~]$ traceroute pdb-a-linux-3.rutgers.edu
traceroute to pdb-a-linux-3.rutgers.edu (128.6.239.28), 30 hops max, 40 byte packets
1 64.57.17.65 (64.57.17.65) 0.318 ms 0.357 ms 0.474 ms
2 local.internet2.magpi.net (216.27.100.53) 2.202 ms 2.239 ms 2.267 ms
3 remote.rutgers.magpi.net (216.27.98.94) 6.161 ms 6.265 ms 6.282 ms
4 198.151.130.133 (198.151.130.133) 6.247 ms 6.256 ms 6.352 ms
5 pdb-a-linux-3.rutgers.edu (128.6.239.28) 7.101 ms 7.202 ms 7.317 ms

I've sent an email off to the IU Noc staff to find out what happened to the NEWY NDT server, the process is running, but no tests are being performed (port blocking?).

Here are a couple of options to dig into this further.

1) run tests to a local NDT/NPAD server. If you don't have a local NPAD server the simplest thing to do is to burn the latest NPToolkit ISO image onto a CD-ROM and bring a server up in your lab http://e2epi.internet2.edu/network-performance-toolkit/network-performance-toolkit.iso This would allow you to test the local switch/infrastructure. (You can also download the NPAD tar file from the PSC site at http://www.psc.edu/networking/projects/pathdiag/npad-1.4.tar.gz )

2) Run a test to the Chicago NDT/NPAD server (ndt.chic.net.internet2.edu port 7123 for NDT and port 8000 for NPAD). I checked earlier today and both the NDT and NPAD servers are operating on the Chicago server. The disadvantage of this approach is that traceroute shows the same path once I get through the Internet2 core (into New York where we peer with MagPI.

3) It would be great if we could test into the MagPI core. I'll check with them to see if they have a server.

So the first task is to find out where this loss is coming from. Is there a dirty fiber in the path somewhere? Are any of the routers/switches reporting a small loss rate? The best way to do this is to measure the various paths by getting NDT/NPAD servers inside Rutgers & MagPI. Right now, I'd say that you need to focus on this part of the path. I'll get the Internet2 NOC involved to get the Los Angles NDT/NPAD server restarted. This would allow you to test the local LA part of the path in parallel (check to see if the local infrastructure on both ends is working properly).

Rich


At 05:38 PM 3/6/2008, Bill Abbott wrote:
We have a server here at Rutgers, connected to a Cisco Catalyst gig switch, and a server in San Diego, connected to a gig switch (not sure of model). The server and switch port are both set to autoneg gigabit, and show that they are in fact 1000. If I change the switch port (at Rutgers) to 100 full, then the server will negotiate to 100 full and the network performance goes to 85-90 Mbps, which is what I would expect out of a 100 Mbit connection with properly tuned servers across the country.

I can't force 1000 on the server, it'll only accept auto. If I test with a local ndt server, I get 900 Mbps.

The 155 Mbps link is our connection out to internet2. Not sure what the slowest link is on the San Diego side, but it's at least 155.

I get the same packet loss and out-of-order messages when connecting from Rutgers to nitro.ucsc.edu, and from San Diego to a local ndt server here (not mine).

I can't connect to ndt.newy.net.internet2.edu using web100clt on the command line, do you have a server in the area that I can?

For NPAD, the url is

ServerData/pdb-a-linux-3.rutgers.edu:2008-03-06-21:34:33.html

for ndt.newy.net.internet2.edu

ndt.internet2.edu gave a protocol error, bad handshake.

no luck connecting to ndt.losa.net.internet2.edu with NPAD


the systems on both ends are linux servers, 2.6 kernels, with tcp max buffer tuned to 1 MB or greater.

Bill


Matthew J Zekauskas wrote:
On 3/6/2008 12:04 AM,

wrote:
I'm troubleshooting data transfers between New Jersey and California over internet2. The slowest link is 155 Mbps, rtt is 90 ms, servers and switches are gigabit.

Tuning tcp buffers only gets the throughput to 11.9 Mbits/sec. If I change the switch port to 100 Full, the throughput increases to 84.4 Mbits/sec.

ndt reports packet loss and duplicate acks when the connection is gigabit, the 100 full connection has no packet loss.

The relevant ndt output is below. iperf also backs up these results.

Is there a known problem with gig switch negotiation? No tcp buffer setting improves the performance when gigabit.
Rich is away traveling, and might have a better answer to your question, but let me try...
So, when you "change the switch port to 100 full": is this at the last hop, between a switch and a "PC"? If so, are you forcing both sides (the switch and the PC) to 100 full?
If not, then I think that there may be a problem with autonegotiation using your hardware.
If so... the bottleneck link is 155 Mbps. When the gigabit-connected PC ramps up, it will start sending longer and longer trains of back-to-back packets at 1000 Mbps, which the device feeding in to the bottleneck link (and intermediate devices) would have to queue. It's possible that one of these devices cannot handle the number of packets that get blasted at it back-to-back. An inexpensive gigabit switch might be such a device.
Looking at detailed NDT results might give us some clues.
Do you know if the 155 Mbps link is on your side or the California side?
There are also NDT servers located within Internet2; if you test to one of them I might be able to look at the results (but I don't want to get your hopes up too much :).
<http://ndt.newy.net.internet2.edu:7123/> is the closest NDT server within Internet2 to Rutgers.
You might also try an NPAD test, which does complementary testing to NDT, looking to see if a device in the path has enough buffer space. There is one that I hope is configured correctly at <http://ndt.newy.net.internet2.edu:8000/>. (Ignore the fact that it claims to be in Chicago.) Use 70mS for the target RTT (basically cross-US), and 150mbps for the target speed, and let's see what that says. That one gives a URL that has the results, send it to me.
Also try the one at the Internet2 office: <http://ndt.internet2.edu:8200/>
If the California school is connected via CENIC, the closest one there is <http://ndt.losa.net.internet2.edu:7123/>; you can test "the other way"-- the california side to the center.
It would also be helpful if you could tell us the devices at both sides; are they PCs? Do you have the operating system and version?
Thanks,
--Matt


Thanks

ndt:

gigabit connection:

running 10s outbound test (client to server) . . . . . 13.67 Mb/s
running 10s inbound test (server to client) . . . . . . 14.30 Mb/s

There were 7 packets retransmitted, 178 duplicate acks received, and 245 SACK blocks received
Packets arrived out-of-order 2.64% of the time.
The connection stalled 1 times due to packet loss.
The connection was idle 0.30 seconds (3.00%) of the time.


100 full connection:

running 10s outbound test (client to server) . . . . . 11.80 Mb/s
running 10s inbound test (server to client) . . . . . . 20.46 Mb/s

No packet loss was observed.


iperf:

gigabit:

[ 3] 0.0-60.1 sec 85.3 MBytes 11.9 Mbits/sec


100 full:

[ 3] 0.0-60.2 sec 606 MBytes 84.4 Mbits/sec



------------------------------------



Richard A. Carlson e-mail:

Network Engineer phone: (734) 352-7043
Internet2 fax: (734) 913-4255
1000 Oakbrook Dr; Suite 300
Ann Arbor, MI 48104




Archive powered by MHonArc 2.6.16.

Top of Page