Skip to Content.
Sympa Menu

ndt-users - Re: bandwidth asymmetry on a 10G link

Subject: ndt-users list created

List archive

Re: bandwidth asymmetry on a 10G link


Chronological Thread 
  • From: Brian Tierney <>
  • To: "Gholmieh, Nathalie" <>
  • Cc: "'Rich Carlson'" <>, "''" <>
  • Subject: Re: bandwidth asymmetry on a 10G link
  • Date: Fri, 21 May 2010 09:09:18 -0700


I'd guess that this is a Cisco configuration issue. See:

http://fasterdata.es.net/cisco.html



On Apr 19, 2010, at 9:59 AM, Gholmieh, Nathalie wrote:

> Hi Rich-
>
> I am sorry my email was not clear.
>
> The two servers are located in different parts of campus and they
> communicate on a 10G fiber link through 2 cisco layer3 switches.
> The two servers are running
> - linux kernel 2.6.30 with web100 patch
> - Myricom 10G-PCIE2-8B2-2S+E
>
> NDT tests run on the two linux machines show the same asymmetry in the
> bandwidth. In summary:
> Web100clt run on machine 1 with machine 2 as server returns:
> C2S bandwidth ~ 10Gbps
> S2C bandwidth < 3Gbps
> Web100clt run on machine 2 with machine 1 as server returns:
> C2S bandwidth ~ 10Gbps
> S2C bandwidth < 3Gbps
>
> I have set the Myricom NICs on both machines to use jumbo frames.
> Write combining is enabled on the NICs
> I have modified the network buffer sizes as sent in my first email to the
> values recommended by Myricom for better performance
> I also have played with the congestion control protocol: I have read that
> cubic might enhance performance, so I set net.ipv4.tcp_congestion_control =
> cubic. I still get the same results.
>
> txqueuelen:1000 on both servers
>
> here are the results of NDT tests run on the two servers:
>
> --------
> M is the client and T is the server:
> --------
>
> [root@M
> ~]# web100clt -n T.ucsd.edu -lll
> Testing network path for configuration and performance problems -- Using
> IPv4 address
> Checking for Middleboxes . . . . . . . . . . . . . . . . . . Done
> checking for firewalls . . . . . . . . . . . . . . . . . . . Done
> running 10s outbound test (client to server) . . . . . 9564.84 Mb/s
> running 10s inbound test (server to client) . . . . . . 2901.84 Mb/s
> The slowest link in the end-to-end path is a 10 Gbps 10 Gigabit
> Ethernet/OC-192 subnet
> Information [S2C]: Packet queuing detected: 70.32% (local buffers)
> Server 'T.ucsd.edu' is not behind a firewall. [Connection to the ephemeral
> port was successful]
> Client is not behind a firewall. [Connection to the ephemeral port was
> successful]
>
> ------ Web100 Detailed Analysis ------
>
> Web100 reports the Round trip time = 6.85 msec;the Packet size = 8960
> Bytes; and
> There were 96 packets retransmitted, 6893 duplicate acks received, and 7002
> SACK blocks received
> Packets arrived out-of-order 3.00% of the time.
> This connection is sender limited 81.76% of the time.
> This connection is network limited 17.63% of the time.
>
> Web100 reports TCP negotiated the optional Performance Settings to:
> RFC 2018 Selective Acknowledgment: ON
> RFC 896 Nagle Algorithm: ON
> RFC 3168 Explicit Congestion Notification: OFF
> RFC 1323 Time Stamping: OFF
> RFC 1323 Window Scaling: ON; Scaling Factors - Server=9, Client=9
> The theoretical network limit is 1970.18 Mbps
> The NDT server has a 16384 KByte buffer which limits the throughput to
> 18688.86 Mbps
> Your PC/Workstation has a 12282 KByte buffer which limits the throughput to
> 14009.23 Mbps
> The network based flow control limits the throughput to 14053.15 Mbps
>
> Client Data reports link is ' 9', Client Acks report link is ' 9'
> Server Data reports link is ' 9', Server Acks report link is ' 9'
> Packet size is preserved End-to-End
> Information: Network Address Translation (NAT) box is modifying the
> Server's IP address
> Server says [<IP1>] but Client says [ T.ucsd.edu]
> Information: Network Address Translation (NAT) box is modifying the
> Client's IP address,
> Server says [<IP2>] but Client says [ M.ucsd.edu]
> CurMSS: 8960
> X_Rcvbuf: 87380
> X_Sndbuf: 16777216
> AckPktsIn: 230148
> AckPktsOut: 0
> BytesRetrans: 843008
> CongAvoid: 0
> CongestionOverCount: 0
> CongestionSignals: 35
> CountRTT: 223133
> CurCwnd: 9309440
> CurRTO: 208
> CurRwinRcvd: 12481024
> CurRwinSent: 17920
> CurSsthresh: 8296960
> DSACKDups: 0
> DataBytesIn: 0
> DataBytesOut: -660222984
> DataPktsIn: 0
> DataPktsOut: 1363901
> DupAcksIn: 6893
> ECNEnabled: 0
> FastRetran: 35
> MaxCwnd: 12615680
> MaxMSS: 8960
> MaxRTO: 211
> MaxRTT: 11
> MaxRwinRcvd: 12576256
> MaxRwinSent: 17920
> MaxSsthresh: 9551360
> MinMSS: 8960
> MinRTO: 201
> MinRTT: 0
> MinRwinRcvd: 17920
> MinRwinSent: 17920
> NagleEnabled: 1
> OtherReductions: 144
> PktsIn: 230148
> PktsOut: 1363901
> PktsRetrans: 96
> RcvWinScale: 9
> SACKEnabled: 3
> SACKsRcvd: 7002
> SendStall: 0
> SlowStart: 0
> SampleRTT: 8
> SmoothedRTT: 8
> SndWinScale: 9
> SndLimTimeRwin: 61464
> SndLimTimeCwnd: 1773632
> SndLimTimeSender: 8227091
> SndLimTransRwin: 544
> SndLimTransCwnd: 28627
> SndLimTransSender: 29117
> SndLimBytesRwin: 41668440
> SndLimBytesCwnd: -1770059740
> SndLimBytesSender: 1068168316
> SubsequentTimeouts: 0
> SumRTT: 1528314
> Timeouts: 0
> TimestampsEnabled: 0
> WinScaleRcvd: 9
> WinScaleSent: 9
> DupAcksOut: 0
> StartTimeUsec: 903832
> Duration: 10062211
> c2sData: 9
> c2sAck: 9
> s2cData: 9
> s2cAck: 9
> half_duplex: 0
> link: 100
> congestion: 0
> bad_cable: 0
> mismatch: 0
> spd: -524.91
> bw: 1970.18
> loss: 0.000025662
> avgrtt: 6.85
> waitsec: 0.00
> timesec: 10.00
> order: 0.0300
> rwintime: 0.0061
> sendtime: 0.8176
> cwndtime: 0.1763
> rwin: 95.9492
> swin: 128.0000
> cwin: 96.2500
> rttsec: 0.006849
> Sndbuf: 16777216
> aspd: 0.00000
> CWND-Limited: 115198.00
> minCWNDpeak: 8960
> maxCWNDpeak: 12615680
> CWNDpeaks: 26
> [root@M
> ~]#
>
> -----------------------
> T is the client and M is the server (I had restarted M so some of the
> values were reset to the default):
> -----------------------
>
> [root@T
> ~]# web100clt -n M.ucsd.edu -lll
> Testing network path for configuration and performance problems -- Using
> IPv4 address
> Checking for Middleboxes . . . . . . . . . . . . . . . . . . Done
> checking for firewalls . . . . . . . . . . . . . . . . . . . Done
> running 10s outbound test (client to server) . . . . . 9687.49 Mb/s
> running 10s inbound test (server to client) . . . . . . 1919.56 Mb/s
> The slowest link in the end-to-end path is a 10 Gbps 10 Gigabit
> Ethernet/OC-192 subnet
> Information [S2C]: Packet queuing detected: 78.17% (local buffers)
> Server 'M.ucsd.edu' is not behind a firewall. [Connection to the ephemeral
> port was successful]
> Client is not behind a firewall. [Connection to the ephemeral port was
> successful]
>
> ------ Web100 Detailed Analysis ------
>
> Web100 reports the Round trip time = 0.79 msec;the Packet size = 8960
> Bytes; and
> No packet loss was observed.
> This connection is sender limited 99.62% of the time.
>
> Web100 reports TCP negotiated the optional Performance Settings to:
> RFC 2018 Selective Acknowledgment: ON
> RFC 896 Nagle Algorithm: ON
> RFC 3168 Explicit Congestion Notification: OFF
> RFC 1323 Time Stamping: OFF
> RFC 1323 Window Scaling: ON; Scaling Factors - Server=9, Client=9
> The theoretical network limit is 8658517.00 Mbps
> The NDT server has a 16384 KByte buffer which limits the throughput to
> 162025.31 Mbps
> Your PC/Workstation has a 12288 KByte buffer which limits the throughput to
> 121518.98 Mbps
> The network based flow control limits the throughput to 121835.44 Mbps
>
> Client Data reports link is ' 9', Client Acks report link is ' 9'
> Server Data reports link is ' 9', Server Acks report link is ' 9'
> Packet size is preserved End-to-End
> Information: Network Address Translation (NAT) box is modifying the
> Server's IP address
> Server says [<IP2>] but Client says [ M.ucsd.edu]
> Information: Network Address Translation (NAT) box is modifying the
> Client's IP address
> Server says [<IP1>] but Client says [ T]
> CurMSS: 8960
> X_Rcvbuf: 87380
> X_Sndbuf: 16777216
> AckPktsIn: 224114
> AckPktsOut: 0
> BytesRetrans: 0
> CongAvoid: 0
> CongestionOverCount: 0
> CongestionSignals: 0
> CountRTT: 224114
> CurCwnd: 12615680
> CurRTO: 201
> CurRwinRcvd: 12558848
> CurRwinSent: 17920
> CurSsthresh: -256
> DSACKDups: 0
> DataBytesIn: 0
> DataBytesOut: -1871234080
> DataPktsIn: 0
> DataPktsOut: 1237850
> DupAcksIn: 0
> ECNEnabled: 0
> FastRetran: 0
> MaxCwnd: 12615680
> MaxMSS: 8960
> MaxRTO: 211
> MaxRTT: 11
> MaxRwinRcvd: 12582912
> MaxRwinSent: 17920
> MaxSsthresh: 0
> MinMSS: 8960
> MinRTO: 201
> MinRTT: 0
> MinRwinRcvd: 17920
> MinRwinSent: 17920
> NagleEnabled: 1
> OtherReductions: 0
> PktsIn: 224114
> PktsOut: 1237850
> PktsRetrans: 0
> RcvWinScale: 9
> SACKEnabled: 3
> SACKsRcvd: 0
> SendStall: 0
> SlowStart: 0
> SampleRTT: 0
> SmoothedRTT: 1
> SndWinScale: 9
> SndLimTimeRwin: 29301
> SndLimTimeCwnd: 8996
> SndLimTimeSender: 10054035
> SndLimTransRwin: 602
> SndLimTransCwnd: 99
> SndLimTransSender: 701
> SndLimBytesRwin: 49860420
> SndLimBytesCwnd: 14403440
> SndLimBytesSender: -1935497940
> SubsequentTimeouts: 0
> SumRTT: 176939
> Timeouts: 0
> TimestampsEnabled: 0
> WinScaleRcvd: 9
> WinScaleSent: 9
> DupAcksOut: 0
> StartTimeUsec: 614339
> Duration: 10094355
> c2sData: 9
> c2sAck: 9
> s2cData: 9
> s2cAck: 9
> half_duplex: 0
> link: 100
> congestion: 0
> bad_cable: 0
> mismatch: 0
> spd: -1483.29
> bw: 8658516.76
> loss: 0.000000000
> avgrtt: 0.79
> waitsec: 0.00
> timesec: 10.00
> order: 0.0000
> rwintime: 0.0029
> sendtime: 0.9962
> cwndtime: 0.0009
> rwin: 96.0000
> swin: 128.0000
> cwin: 96.2500
> rttsec: 0.000790
> Sndbuf: 16777216
> aspd: 0.00000
> CWND-Limited: 121894.00
> minCWNDpeak: -1
> maxCWNDpeak: -1
> CWNDpeaks: -1
> [root@T
> ~]#
>
> -------------------------------
>
> thanks!
>
>
> Nathalie~
>
>
>
> -----Original Message-----
> From: Rich Carlson
> [mailto:]
> Sent: Thursday, April 15, 2010 11:17 AM
> To: Gholmieh, Nathalie
> Cc:
> ''
> Subject: Re: bandwidth asymmetry on a 10G link
>
> Hi Nathalie;
>
> I'm not sure I understand the configuration and the problem so let me
> ask for clarification.
>
> You have 2 hosts connected back-to-back with a cross-over cable (fiber
> or copper?) You have installed an NDT server on both nodes and from
> either node you get asymmetric results as shown below. If this is not
> correct, then please clarify.
>
> A couple of questions.
> 1) What Linux kernel version are you using?
> 2) what Myircom driver version are you using?
> 3) have you tuned any of the Myircom parameters?
>
> more comments in-line
>
> On 4/15/2010 1:27 PM, Gholmieh, Nathalie wrote:
>> Hi-
>>
>> I have setup 2 NDT servers interconnected with a 10G link, both using
>> Myricom 10G NICs, on our local network. the two servers have the same
>> versions of NDT 3.6.1.
>>
>> when running NDT tests between the 2 servers, I get a C2S bandwidth of
>> approximately 10Gbps, but the S2C bandwidth is not exceeding 3 Gbps, and
>> that is on BOTH machines:
>>
>> [root@M
>> ~]# web100clt -n T -4 -l
>>
>> Testing network path for configuration and performance problems -- Using
>> IPv4 address
>>
>> Checking for Middleboxes . . . . . . . . . . . . . . . . . . Done
>>
>> checking for firewalls . . . . . . . . . . . . . . . . . . . Done
>>
>> *running 10s outbound test (client to server) . . . . . 9351.25 Mb/s*
>>
>> *running 10s inbound test (server to client) . . . . . . 2605.19 Mb/s*
>
> If the results in 1 direction showed these rates and a test in the
> opposite direction showed an inverted state (c2s lower than s2c) then I
> would suspect a problem with pacing or flow control in 1 direction or a
> configuration problem on 1 node. However, if both nodes report the same
> results (c2s is always greater than s2c) then (1) it is a problem with
> my code in the xmit loop; or (2) an unknown problem.
>
>> The slowest link in the end-to-end path is a 10 Gbps 10 Gigabit
>> Ethernet/OC-192 subnet
>>
>> *Information [S2C]: Packet queuing detected: 72.52% (local buffers)*
>>
>> Server 'T' is not behind a firewall. [Connection to the ephemeral port
>> was successful]
>>
>> Client is not behind a firewall. [Connection to the ephemeral port was
>> successful]
>>
>> ------ Web100 Detailed Analysis ------
>>
>> Web100 reports the Round trip time = 4.09 msec;the Packet size = 8960
>> Bytes; and
>
> The RTT includes host queuing time (~5.6 MB in queue) using jumbo
> frames. What is the txqueuelen value for this interface (ifconfig command)?
>
>> There were 337 packets retransmitted, 9749 duplicate acks received, and
>> 10089 SACK blocks received
>>
>> Packets arrived out-of-order 4.32% of the time.
>
> Packets are being reordered. This is probably due to the pkt processing
> by multiple cores.
>
>> The connection stalled 1 times due to packet loss.
>>
>> The connection was idle 0.20 seconds (2.00%) of the time.
>
> The sending node went through at least 1 timeout. Add a 2nd -l to the
> command line and look at the last 3 variables that get reported (*CWND*)
> this will tell you the number of times TCP invoked the CA algorithm and
> what the high and low watermarks were.
>
>> This connection is receiver limited 2.41% of the time.
>>
>> This connection is sender limited 76.06% of the time.
>
> This is saying that the sender has limited resources, probably
> txqueuelen limits that prevent it from sending more data. Note with
> jumbo frames 4 msec is about 5.6 MB and 624 packets.
>
>> This connection is network limited 21.54% of the time.
>>
>> Web100 reports TCP negotiated the optional Performance Settings to:
>>
>> RFC 2018 Selective Acknowledgment: ON
>>
>> RFC 896 Nagle Algorithm: ON
>>
>> RFC 3168 Explicit Congestion Notification: OFF
>>
>> RFC 1323 Time Stamping: OFF
>>
>> RFC 1323 Window Scaling: ON; Scaling Factors - Server=9, Client=9
>>
>> The theoretical network limit is 2148.73 Mbps
>
> This is from the Mathis equation ((pkt-size)/(rtt*sqrt(loss))). This is
> about the same as the measured rate so this is the limiting factor.
>
>> The NDT server has a 16384 KByte buffer which limits the throughput to
>> 31295.84 Mbps
>>
>> Your PC/Workstation has a 12282 KByte buffer which limits the throughput
>> to 23459.46 Mbps
>>
>> The network based flow control limits the throughput to 23533.01 Mbps
>
> Buffer space from, your tuning parms below, is adequate.
>
>> Client Data reports link is ' 9', Client Acks report link is ' 9'
>>
>> Server Data reports link is ' 9', Server Acks report link is ' 9'
>>
>> Packet size is preserved End-to-End
>>
>> Information: Network Address Translation (NAT) box is modifying the
>> Server's IP address
>>
>> Server says [<IP>] but Client says [ T]
>>
>> Information: Network Address Translation (NAT) box is modifying the
>> Client's IP address
>>
>> Server says [<IP2>] but Client says [M]
>>
>> [root@M
>> ~]#
>>
>> I have these sysctl values set on both servers:
>>
>> net.core.rmem_max = 16777216
>>
>> net.core.wmem_max = 16777216
>>
>> net.ipv4.tcp_wmem = 4096 65536 16777216
>>
>> net.ipv4.tcp_rmem = 4096 87380 16777216
>>
>> net.core.netdev_max_backlog = 250000
>>
>> net.ipv4.tcp_no_metrics_save = 1
>
> Run the ifconfig command and report the txqueuelen value.
>
>> I have also noticed that same asymmetry in the bandwidth while
>> transferring an FTP file back and forth on the same link between the two
>> servers.
>>
>> Note that the traffic both ways is using the same path.
>>
>> I am wondering why there is a difference between the sent and received
>> bandwidth, and what parameters I should tune to use the full 10G both ways.
>>
>> Any ideas are very appreciated.
>>
>> Thanks!
>
> I don't have a good clue right now. Check the things listed
> (txqueuelen, version info, NIC tuning) and also run with more logging
> (-ll instead of -l). Turning on flow control may help. Also consider
> running an NPAD test. The NPAD system probes for pkt queues and other
> system configuration settings and it may point out more details that can
> help you understand what is going on here.
>
> Rich
>
>> Nathalie~//
>>



  • Re: bandwidth asymmetry on a 10G link, Brian Tierney, 05/21/2010

Archive powered by MHonArc 2.6.16.

Top of Page