Skip to Content.
Sympa Menu

ndt-users - Re: gigabit slower than fast-e

Subject: ndt-users list created

List archive

Re: gigabit slower than fast-e


Chronological Thread 
  • From: Matt Mathis <>
  • To: Richard Carlson <>
  • Cc: Bill Abbott <>, Matthew J Zekauskas <>,
  • Subject: Re: gigabit slower than fast-e
  • Date: Fri, 7 Mar 2008 14:08:44 -0500 (EST)

Exactly what is this path? It looks pretty sick....

http://ndt.newy.net.internet2.edu:8000/ServerData/pdb-a-linux-3.rutgers.edu:2008-03-06-21:34:33.html

It is fairly clean for window sizes smaller than about 120 packets (corresponding to 266 Mb/s) but then it goes crazy - high loss and reduced performance. 266 Mb/s is a rather odd rate. Can somebody shed some light on the topology?

There is definitely something going on here... Does this happen to include the heavily loaded main link into campus? If so, this is consistend with some sort of buffer starvation in the router at the campus edge. Where is it relative to the 155 Mb/s link?

Thanks,
--MM--
-------------------------------------------
Matt Mathis http://staff.psc.edu/mathis
Work:412.268.3319 Home/Cell:412.654.7529
-------------------------------------------
Evil is defined by mortals who think they know
"The Truth" and use force to apply it to others.

On Fri, 7 Mar 2008, Richard Carlson wrote:

Matt;

http://ndt.newy.net.internet2.edu:8000/ServerData/pdb-a-linux-3.rutgers.edu:2008-03-06-21:34:33.html has the data. I can send you any of the server files if you want/need to look at something.

I've sent an email to the MAGPI director asking if/when they will install a server so we can test that portion of the path. It would also be great if the Rutgers folks had a local campus server as well. Note that you can easily stand up a test server using the Knoppix based NPToolkit ISO image. Just download, burn, and boot the image found at http://e2epi.internet2.edu/network-performance-toolkit/network-performance-toolkit.iso to get started.

Rich

At 08:05 AM 3/7/2008, Matt Mathis wrote:
What is the full URL of the NPAD results?

This sort of symptom (raising performance by adding a near bottleneck) is typical of some piece of gear in the path that either can not queue packets or can not deliver (sustained) back to back packets. When the 100 Mb/s bottleneck is present it guarantees idle time at the other potential bottleneck.

Examples include:

ATM with conventional telco style short queue switches plus tiny cross traffic. If you can actually create a standing packet queue at the SAR, ATM will potentially have zero tolerance for any cross traffic that comes in through a different SAR.

Miss-clocked conventional CSU/DSU pairs that are smart enough to only insert clock slips by inserting/deleting idle symbols. If there there is no idle, they trash periodic frames. The fun thing about this one is the symptoms change as the temperature of the machine rooms change....

There are others too.

The problem has to be at the entrance to the 155 Mb/s span or in the span itself. NPAD does not have a specific test for this one (since it has been several years since I have actually seen a problem like this).

Can you arrange to run NPAD end-to-end through the problematic span? It may look normal, except for having a very short queue. Point me at the report, and I may be able to see something in the raw data.

Thanks,
--MM--
-------------------------------------------
Matt Mathis http://staff.psc.edu/mathis
Work:412.268.3319 Home/Cell:412.654.7529
-------------------------------------------
Evil is defined by mortals who think they know
"The Truth" and use force to apply it to others.

On Thu, 6 Mar 2008, Bill Abbott wrote:

We have a server here at Rutgers, connected to a Cisco Catalyst gig switch, and a server in San Diego, connected to a gig switch (not sure of model). The server and switch port are both set to autoneg gigabit, and show that they are in fact 1000. If I change the switch port (at Rutgers) to 100 full, then the server will negotiate to 100 full and the network performance goes to 85-90 Mbps, which is what I would expect out of a 100 Mbit connection with properly tuned servers across the country.

I can't force 1000 on the server, it'll only accept auto. If I test with a local ndt server, I get 900 Mbps.

The 155 Mbps link is our connection out to internet2. Not sure what the slowest link is on the San Diego side, but it's at least 155.

I get the same packet loss and out-of-order messages when connecting from Rutgers to nitro.ucsc.edu, and from San Diego to a local ndt server here (not mine).

I can't connect to ndt.newy.net.internet2.edu using web100clt on the command line, do you have a server in the area that I can?

For NPAD, the url is

ServerData/pdb-a-linux-3.rutgers.edu:2008-03-06-21:34:33.html

for ndt.newy.net.internet2.edu

ndt.internet2.edu gave a protocol error, bad handshake.

no luck connecting to ndt.losa.net.internet2.edu with NPAD


the systems on both ends are linux servers, 2.6 kernels, with tcp max buffer tuned to 1 MB or greater.

Bill


Matthew J Zekauskas wrote:
On 3/6/2008 12:04 AM,

wrote:
I'm troubleshooting data transfers between New Jersey and California over internet2. The slowest link is 155 Mbps, rtt is 90 ms, servers and switches are gigabit.
Tuning tcp buffers only gets the throughput to 11.9 Mbits/sec. If I change the switch port to 100 Full, the throughput increases to 84.4 Mbits/sec.
ndt reports packet loss and duplicate acks when the connection is gigabit, the 100 full connection has no packet loss.
The relevant ndt output is below. iperf also backs up these results.
Is there a known problem with gig switch negotiation? No tcp buffer setting improves the performance when gigabit.
Rich is away traveling, and might have a better answer to your question, but let me try...
So, when you "change the switch port to 100 full": is this at the last hop, between a switch and a "PC"? If so, are you forcing both sides (the switch and the PC) to 100 full?
If not, then I think that there may be a problem with autonegotiation using your hardware.
If so... the bottleneck link is 155 Mbps. When the gigabit-connected PC ramps up, it will start sending longer and longer trains of back-to-back packets at 1000 Mbps, which the device feeding in to the bottleneck link (and intermediate devices) would have to queue. It's possible that one of these devices cannot handle the number of packets that get blasted at it back-to-back. An inexpensive gigabit switch might be such a device.
Looking at detailed NDT results might give us some clues.
Do you know if the 155 Mbps link is on your side or the California side?
There are also NDT servers located within Internet2; if you test to one of them I might be able to look at the results (but I don't want to get your hopes up too much :).
<http://ndt.newy.net.internet2.edu:7123/> is the closest NDT server within Internet2 to Rutgers.
You might also try an NPAD test, which does complementary testing to NDT, looking to see if a device in the path has enough buffer space. There is one that I hope is configured correctly at <http://ndt.newy.net.internet2.edu:8000/>. (Ignore the fact that it claims to be in Chicago.) Use 70mS for the target RTT (basically cross-US), and 150mbps for the target speed, and let's see what that says. That one gives a URL that has the results, send it to me.
Also try the one at the Internet2 office: <http://ndt.internet2.edu:8200/>
If the California school is connected via CENIC, the closest one there is <http://ndt.losa.net.internet2.edu:7123/>; you can test "the other way"-- the california side to the center.
It would also be helpful if you could tell us the devices at both sides; are they PCs? Do you have the operating system and version?
Thanks,
--Matt

Thanks
ndt:
gigabit connection:
running 10s outbound test (client to server) . . . . . 13.67 Mb/s
running 10s inbound test (server to client) . . . . . . 14.30 Mb/s
There were 7 packets retransmitted, 178 duplicate acks received, and 245 SACK blocks received
Packets arrived out-of-order 2.64% of the time.
The connection stalled 1 times due to packet loss.
The connection was idle 0.30 seconds (3.00%) of the time.

100 full connection:
running 10s outbound test (client to server) . . . . . 11.80 Mb/s
running 10s inbound test (server to client) . . . . . . 20.46 Mb/s
No packet loss was observed.

iperf:
gigabit:
[ 3] 0.0-60.1 sec 85.3 MBytes 11.9 Mbits/sec

100 full:
[ 3] 0.0-60.2 sec 606 MBytes 84.4 Mbits/sec


------------------------------------



Richard A. Carlson e-mail:
Network Engineer phone: (734) 352-7043
Internet2 fax: (734) 913-4255
1000 Oakbrook Dr; Suite 300
Ann Arbor, MI 48104



Archive powered by MHonArc 2.6.16.

Top of Page