Skip to Content.
Sympa Menu

ndt-users - Re: Using NDT with 10 gigabit interfaces

Subject: ndt-users list created

List archive

Re: Using NDT with 10 gigabit interfaces


Chronological Thread 
  • From: Richard Carlson <>
  • To: , Byron Hicks <>
  • Subject: Re: Using NDT with 10 gigabit interfaces
  • Date: Wed, 15 May 2013 22:33:07 -0400
  • Authentication-results: sfpop-ironport01.merit.edu; dkim=pass (signature verified [TEST])

Byron;

As you and Nat have noted, the NDT server seems to have a problem with 10 Gbps links. I have noticed this behavior before, but I do not have any 10 Gbps nodes to play with.

If your willing to do some investigating, maybe we can figure out what's going on.

The questions I would start with are:

1) are you using the web client or the command line client? (I think it's the command line client.)

2) what is the CPU load on the server and client while the tests are running. You can either run top with a short refresh time or turn the CPU monitoring flag on. (I don't expect the CPU is overloaded, but let's find out).

3) the NDT server captures the analysis data during the Server-to-Client test so what does the more details page say? (run the command line tool with the -ll option to extract this data or look at the web100srv.log file)

However, looking at the code and the packet queuing message I think we have a probably answer.

The packet queuing detected message is generated on the client during the analysis phase. Both the client and the server measure the transfer speed (bytes transferred/time). The client reports the value it calculated. It is sent the speed the server calculated after the tests complete and the server sends this value as part of the larger test results back to the client for printing. The formula is ((s2cspd-spdin)/s2cspd)*100 which is the servers calculation minus the clients calculation and converted to a percentage. So your 80.16% says the server's calculation was 80% higher (a rough guess says the server calculated the speed around 7.4 Gbps).

Given all this I'd look at the command line client's read data loop. It's not terminating properly and in this case ran about 50 seconds instead of 10.

Try running the command line tool with some -d flags to print out debugging information. That might tell you more about what's going on.

Rich

On 05/15/2013 04:25 PM, Byron Hicks wrote:
I'm reasonably certain that I have a clean path.

Both boxes are running NDT, and I get the same result in both directions:

Houston:

running 10s outbound test (client to server) . . . . . 9123.91 Mb/s
running 10s inbound test (server to client) . . . . . . 1434.11 Mb/s

Dallas:

running 10s outbound test (client to server) . . . . . 8953.05 Mb/s
running 10s inbound test (server to client) . . . . . . 1440.18 Mb/s

If it were a traffic loss issue, I would expect that the
outbound/inbound numbers would flip, with the lower number being on the
"leg" of the duplex path that had the traffic loss.

But I'm not. Client to Server is 9Gb/s and Server to Client is 1.4Gb/s,
regardless of which NDT server I'm testing from/to. And considering
that I'm getting 9Gb/s on a 10Gb/s link using iperf in both directions,
I'm pretty sure packet loss is a not a factor.

How do I interpret the following:

Information [S2C]: Packet queuing detected: 80.16% (remote buffers)

Where is the packet queuing happening?


On 05/15/2013 01:37 PM, Brian Tierney wrote:
Another possibility is that I've seen cases where, on a path with packet
loss, different clients seem to trigger different loss patterns.

For example, here is on a clean path:


web100clt -n ps-lax-10g.cenic.net -b 33554432
running 10s outbound test (client to server) . . . . . 2321.11 Mb/s
running 10s inbound test (server to client) . . . . . . 2802.95 Mb/s

vs bwctl:

bwctl -c ps-lax-10g.cenic.net -fm
bwctl: Using tool: iperf
[ 14] local 137.164.28.105 port 5001 connected with 198.129.254.98 port 5001
[ ID] Interval Transfer Bandwidth
[ 14] 0.0-10.0 sec 2984 MBytes 2496 Mbits/sec

performance is similar.

----------

And here are the results for a path with packet loss:

web100clt -n ps-lax-10g.cenic.net -b 33554432
running 10s outbound test (client to server) . . . . . 18.06 Mb/s
running 10s inbound test (server to client) . . . . . . 2492.69 Mb/s

bwctl -c ps-lax-10g.cenic.net -fm
[ 14] local 137.164.28.105 port 5001 connected with 198.129.254.150 port 5001
[ ID] Interval Transfer Bandwidth
[ 14] 0.0-10.3 sec 552 MBytes 450 Mbits/sec

Here iperf does 30x better than NDT (and btw, nuttcp results agree with the
NDT results in this case)

My guess is that different tools have different burst characteristics, and
these trigger different amounts of packet loss.





Archive powered by MHonArc 2.6.16.

Top of Page