Skip to Content.
Sympa Menu

ndt-users - Re: Using NDT with 10 gigabit interfaces

Subject: ndt-users list created

List archive

Re: Using NDT with 10 gigabit interfaces


Chronological Thread 
  • From: "Nichole K. Boscia" <>
  • To: Richard Carlson <>
  • Cc: "" <>, Byron Hicks <>
  • Subject: Re: Using NDT with 10 gigabit interfaces
  • Date: Wed, 15 May 2013 20:12:39 -0700 (PDT)
  • Authentication-results: sfpop-ironport01.merit.edu; dkim=neutral (message not signed) header.i=none


Hey folks, just wanted to chime in that I've had the same problem for years. One thing you may want to check is your CPU affinity. It made a huge difference in rates (from 700 Mbps to > 3 Gbps) on the server to client test. Try setting the affinity to that of your NIC driver (for me, eth2 is core 2, 'taskset -c 2 <ndtd>').

It's still not nearly as good as the c->s direction (50%), but that seems to be CPU-bound. There's a lot of utilization going to servicing the soft IRQs, so that combined with the system/user leaves the CPU is 100% utilized.

More:
http://fasterdata.es.net/host-tuning/interrupt-binding/

I'm not really a systems person, but maybe that info will be helpful to someone. :) I, too, would like to see the performance tests more symmetric in results (especially just on a LAN link).

Cheers!
-nik

-------------------------------------------
Nichole K. Boscia
Senior Network Engineer, CSC
NASA Advanced Supercomputing Division
Ames Research Center, Moffett Field, CA 94035

On Wed, 15 May 2013, Richard Carlson wrote:

Date: Wed, 15 May 2013 21:33:07 -0500
From: Richard Carlson
<>
To:
""

<>,
Byron Hicks
<>
Subject: Re: Using NDT with 10 gigabit interfaces

Byron;

As you and Nat have noted, the NDT server seems to have a problem with
10 Gbps links. I have noticed this behavior before, but I do not have
any 10 Gbps nodes to play with.

If your willing to do some investigating, maybe we can figure out what's
going on.

The questions I would start with are:

1) are you using the web client or the command line client? (I think
it's the command line client.)

2) what is the CPU load on the server and client while the tests are
running. You can either run top with a short refresh time or turn the
CPU monitoring flag on. (I don't expect the CPU is overloaded, but
let's find out).

3) the NDT server captures the analysis data during the Server-to-Client
test so what does the more details page say? (run the command line tool
with the -ll option to extract this data or look at the web100srv.log file)

However, looking at the code and the packet queuing message I think we
have a probably answer.

The packet queuing detected message is generated on the client during
the analysis phase. Both the client and the server measure the transfer
speed (bytes transferred/time). The client reports the value it
calculated. It is sent the speed the server calculated after the tests
complete and the server sends this value as part of the larger test
results back to the client for printing. The formula is
((s2cspd-spdin)/s2cspd)*100 which is the servers calculation minus the
clients calculation and converted to a percentage. So your 80.16% says
the server's calculation was 80% higher (a rough guess says the server
calculated the speed around 7.4 Gbps).

Given all this I'd look at the command line client's read data loop.
It's not terminating properly and in this case ran about 50 seconds
instead of 10.

Try running the command line tool with some -d flags to print out
debugging information. That might tell you more about what's going on.

Rich

On 05/15/2013 04:25 PM, Byron Hicks wrote:
I'm reasonably certain that I have a clean path.

Both boxes are running NDT, and I get the same result in both directions:

Houston:

running 10s outbound test (client to server) . . . . . 9123.91 Mb/s
running 10s inbound test (server to client) . . . . . . 1434.11 Mb/s

Dallas:

running 10s outbound test (client to server) . . . . . 8953.05 Mb/s
running 10s inbound test (server to client) . . . . . . 1440.18 Mb/s

If it were a traffic loss issue, I would expect that the
outbound/inbound numbers would flip, with the lower number being on the
"leg" of the duplex path that had the traffic loss.

But I'm not. Client to Server is 9Gb/s and Server to Client is 1.4Gb/s,
regardless of which NDT server I'm testing from/to. And considering
that I'm getting 9Gb/s on a 10Gb/s link using iperf in both directions,
I'm pretty sure packet loss is a not a factor.

How do I interpret the following:

Information [S2C]: Packet queuing detected: 80.16% (remote buffers)

Where is the packet queuing happening?


On 05/15/2013 01:37 PM, Brian Tierney wrote:
Another possibility is that I've seen cases where, on a path with packet
loss, different clients seem to trigger different loss patterns.

For example, here is on a clean path:


web100clt -n ps-lax-10g.cenic.net -b 33554432
running 10s outbound test (client to server) . . . . . 2321.11 Mb/s
running 10s inbound test (server to client) . . . . . . 2802.95 Mb/s

vs bwctl:

bwctl -c ps-lax-10g.cenic.net -fm
bwctl: Using tool: iperf
[ 14] local 137.164.28.105 port 5001 connected with 198.129.254.98 port 5001
[ ID] Interval Transfer Bandwidth
[ 14] 0.0-10.0 sec 2984 MBytes 2496 Mbits/sec

performance is similar.

----------

And here are the results for a path with packet loss:

web100clt -n ps-lax-10g.cenic.net -b 33554432
running 10s outbound test (client to server) . . . . . 18.06 Mb/s
running 10s inbound test (server to client) . . . . . . 2492.69 Mb/s

bwctl -c ps-lax-10g.cenic.net -fm
[ 14] local 137.164.28.105 port 5001 connected with 198.129.254.150 port 5001
[ ID] Interval Transfer Bandwidth
[ 14] 0.0-10.3 sec 552 MBytes 450 Mbits/sec

Here iperf does 30x better than NDT (and btw, nuttcp results agree with the
NDT results in this case)

My guess is that different tools have different burst characteristics, and
these trigger different amounts of packet loss.






Archive powered by MHonArc 2.6.16.

Top of Page