ndt-users - Re: Using NDT with 10 gigabit interfaces
Subject: ndt-users list created
List archive
- From: Brian Tierney <>
- To: Matt Mathis <>
- Cc: NDT users <>
- Subject: Re: Using NDT with 10 gigabit interfaces
- Date: Tue, 31 May 2011 20:11:05 -0700
On May 31, 2011, at 8:10 AM, Matt Mathis wrote:
in my last email): using the web100clt tool between 2 nearby 10G NDT hosts (RTT = 0.02 ms) I consistently see results similar to this: running 10s outbound test (client to server) . . . . . 7748.44 Mb/s running 10s inbound test (server to client) . . . . . . 425.89 Mb/s while iperf is consistently around 8.3 Gbps both directions (results are the same if I swap client and server hosts, btw) vmstat output from server during 'client to server' testing: procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 1 0 0 3000756 153796 284476 0 0 0 0 275956 106682 2 27 71 0 0 3 0 0 3000012 153796 284476 0 0 0 184 278421 125647 3 29 69 0 0 2 0 0 3000016 153796 284492 0 0 0 0 281350 102942 2 27 71 0 0 2 0 0 2999024 153796 284492 0 0 0 0 281674 103412 2 28 70 0 0 2 0 0 2999768 153796 284492 0 0 0 0 281432 103257 2 27 71 0 0 2 0 0 2999148 153796 284492 0 0 0 0 281082 102463 2 28 70 0 0 2 0 0 2999148 153796 284492 0 0 0 56 281413 102872 2 27 71 0 0 1 0 0 3001616 153796 284492 0 0 0 64 218677 114352 2 20 78 0 0 vmstat output on server during 'server to client' testing: 1 0 0 3002236 153796 284492 0 0 0 0 193199 142030 2 16 83 0 0 0 0 0 3002484 153796 284492 0 0 0 0 193191 142068 2 15 83 0 0 1 0 0 2999880 153796 284492 0 0 0 240 193065 142319 2 16 82 0 0 1 0 0 2994672 153796 284492 0 0 0 0 193231 142132 2 16 83 0 0 1 0 0 2993316 153796 284492 0 0 0 64 193451 142211 1 16 82 0 0 1 0 0 2996664 153796 284492 0 0 0 0 191818 145425 2 15 83 0 0 0 0 0 2996420 153796 284496 0 0 0 0 189887 143033 2 15 83 0 0 Note the very high context switches per second values (cs), particularly while sending and compare with iperf: procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 1 0 0 3024664 153856 286472 0 0 0 0 213989 5348 0 11 89 0 0 0 0 0 3024416 153856 286472 0 0 0 0 213440 4019 0 11 89 0 0 0 0 0 3024168 153856 286472 0 0 0 0 213908 3239 0 11 89 0 0 1 0 0 3023796 153856 286472 0 0 0 0 213721 2613 0 11 89 0 0 2 0 0 3023548 153856 286472 0 0 0 48 213933 2113 0 11 89 0 0 0 0 0 3022804 153856 286472 0 0 0 0 213921 1758 0 11 89 0 0 0 0 0 3022432 153856 286472 0 0 0 0 213864 1531 0 12 88 0 0 0 0 0 3021936 153856 286472 0 0 0 240 213558 1331 0 11 89 0 0 2 0 0 3021564 153856 286472 0 0 0 0 213885 1202 0 11 89 0 0 Which is a dramatic differences in context switches (as expected due to the web100 calls). These hosts have 6 "Intel(R) Core(TM) i7 CPU X 980 @ 3.33GHz" CPUs. Using mpstat we see CPU on 2 processors, and some additional interrupts on a 3rd 06:43:38 PM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s 06:43:40 PM 0 8.00 0.00 19.50 0.00 4.50 40.00 0.00 28.00 191739.50 06:43:40 PM 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 06:43:40 PM 2 2.01 0.00 18.59 0.00 0.00 7.54 0.00 71.86 0.00 06:43:40 PM 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 1001.00 06:43:40 PM 4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 06:43:40 PM 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 So then I tried increasing the interrupt coalescing to 100ms (it was set to 0), and this made a big difference: running 10s outbound test (client to server) . . . . . 9394.79 Mb/s running 10s inbound test (server to client) . . . . . . 2523.48 Mb/s and brought the number of intr/sec down by around 20x 08:06:53 PM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s 08:06:55 PM 0 5.47 0.00 14.43 0.00 1.49 22.39 0.00 56.22 9907.96 08:06:55 PM 1 3.00 0.00 31.00 0.00 0.00 6.50 0.00 59.50 0.00 08:06:55 PM 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 08:06:55 PM 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 996.02 08:06:55 PM 4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 08:06:55 PM 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 But inbound is still 4x slower than outbound. (iperf is now 9.6 Gbps both directions). Anyone know any other myricom tuning knobs to try? Is the conclusion to all this: "to do NDT/web100 at 10G requires a web10G kernel" ?
|
- Using NDT with 10 gigabit interfaces, ntstoddard, 05/02/2011
- Re: Using NDT with 10 gigabit interfaces, Rich Carlson, 05/02/2011
- Message not available
- Re: Using NDT with 10 gigabit interfaces, Brian Tierney, 05/28/2011
- Re: Using NDT with 10 gigabit interfaces, Matt Mathis, 05/31/2011
- Re: Using NDT with 10 gigabit interfaces, Brian Tierney, 05/31/2011
- Re: Using NDT with 10 gigabit interfaces, Aaron Brown, 05/31/2011
- Re: Using NDT with 10 gigabit interfaces, Nat Stoddard, 05/31/2011
- Re: Using NDT with 10 gigabit interfaces, Matt Mathis, 05/31/2011
- Re: Using NDT with 10 gigabit interfaces, Brian Tierney, 05/28/2011
Archive powered by MHonArc 2.6.16.