ndt-users - RE: bandwidth asymmetry on a 10G link
Subject: ndt-users list created
List archive
- From: "Gholmieh, Nathalie" <>
- To: 'Rich Carlson' <>
- Cc: "''" <>
- Subject: RE: bandwidth asymmetry on a 10G link
- Date: Mon, 19 Apr 2010 09:59:16 -0700
- Accept-language: en-US
- Acceptlanguage: en-US
Hi Rich-
I am sorry my email was not clear.
The two servers are located in different parts of campus and they communicate
on a 10G fiber link through 2 cisco layer3 switches.
The two servers are running
- linux kernel 2.6.30 with web100 patch
- Myricom 10G-PCIE2-8B2-2S+E
NDT tests run on the two linux machines show the same asymmetry in the
bandwidth. In summary:
Web100clt run on machine 1 with machine 2 as server returns:
C2S bandwidth ~ 10Gbps
S2C bandwidth < 3Gbps
Web100clt run on machine 2 with machine 1 as server returns:
C2S bandwidth ~ 10Gbps
S2C bandwidth < 3Gbps
I have set the Myricom NICs on both machines to use jumbo frames.
Write combining is enabled on the NICs
I have modified the network buffer sizes as sent in my first email to the
values recommended by Myricom for better performance
I also have played with the congestion control protocol: I have read that
cubic might enhance performance, so I set net.ipv4.tcp_congestion_control =
cubic. I still get the same results.
txqueuelen:1000 on both servers
here are the results of NDT tests run on the two servers:
--------
M is the client and T is the server:
--------
[root@M
~]# web100clt -n T.ucsd.edu -lll
Testing network path for configuration and performance problems -- Using
IPv4 address
Checking for Middleboxes . . . . . . . . . . . . . . . . . . Done
checking for firewalls . . . . . . . . . . . . . . . . . . . Done
running 10s outbound test (client to server) . . . . . 9564.84 Mb/s
running 10s inbound test (server to client) . . . . . . 2901.84 Mb/s
The slowest link in the end-to-end path is a 10 Gbps 10 Gigabit
Ethernet/OC-192 subnet
Information [S2C]: Packet queuing detected: 70.32% (local buffers)
Server 'T.ucsd.edu' is not behind a firewall. [Connection to the ephemeral
port was successful]
Client is not behind a firewall. [Connection to the ephemeral port was
successful]
------ Web100 Detailed Analysis ------
Web100 reports the Round trip time = 6.85 msec;the Packet size = 8960 Bytes;
and
There were 96 packets retransmitted, 6893 duplicate acks received, and 7002
SACK blocks received
Packets arrived out-of-order 3.00% of the time.
This connection is sender limited 81.76% of the time.
This connection is network limited 17.63% of the time.
Web100 reports TCP negotiated the optional Performance Settings to:
RFC 2018 Selective Acknowledgment: ON
RFC 896 Nagle Algorithm: ON
RFC 3168 Explicit Congestion Notification: OFF
RFC 1323 Time Stamping: OFF
RFC 1323 Window Scaling: ON; Scaling Factors - Server=9, Client=9
The theoretical network limit is 1970.18 Mbps
The NDT server has a 16384 KByte buffer which limits the throughput to
18688.86 Mbps
Your PC/Workstation has a 12282 KByte buffer which limits the throughput to
14009.23 Mbps
The network based flow control limits the throughput to 14053.15 Mbps
Client Data reports link is ' 9', Client Acks report link is ' 9'
Server Data reports link is ' 9', Server Acks report link is ' 9'
Packet size is preserved End-to-End
Information: Network Address Translation (NAT) box is modifying the Server's
IP address
Server says [<IP1>] but Client says [ T.ucsd.edu]
Information: Network Address Translation (NAT) box is modifying the Client's
IP address,
Server says [<IP2>] but Client says [ M.ucsd.edu]
CurMSS: 8960
X_Rcvbuf: 87380
X_Sndbuf: 16777216
AckPktsIn: 230148
AckPktsOut: 0
BytesRetrans: 843008
CongAvoid: 0
CongestionOverCount: 0
CongestionSignals: 35
CountRTT: 223133
CurCwnd: 9309440
CurRTO: 208
CurRwinRcvd: 12481024
CurRwinSent: 17920
CurSsthresh: 8296960
DSACKDups: 0
DataBytesIn: 0
DataBytesOut: -660222984
DataPktsIn: 0
DataPktsOut: 1363901
DupAcksIn: 6893
ECNEnabled: 0
FastRetran: 35
MaxCwnd: 12615680
MaxMSS: 8960
MaxRTO: 211
MaxRTT: 11
MaxRwinRcvd: 12576256
MaxRwinSent: 17920
MaxSsthresh: 9551360
MinMSS: 8960
MinRTO: 201
MinRTT: 0
MinRwinRcvd: 17920
MinRwinSent: 17920
NagleEnabled: 1
OtherReductions: 144
PktsIn: 230148
PktsOut: 1363901
PktsRetrans: 96
RcvWinScale: 9
SACKEnabled: 3
SACKsRcvd: 7002
SendStall: 0
SlowStart: 0
SampleRTT: 8
SmoothedRTT: 8
SndWinScale: 9
SndLimTimeRwin: 61464
SndLimTimeCwnd: 1773632
SndLimTimeSender: 8227091
SndLimTransRwin: 544
SndLimTransCwnd: 28627
SndLimTransSender: 29117
SndLimBytesRwin: 41668440
SndLimBytesCwnd: -1770059740
SndLimBytesSender: 1068168316
SubsequentTimeouts: 0
SumRTT: 1528314
Timeouts: 0
TimestampsEnabled: 0
WinScaleRcvd: 9
WinScaleSent: 9
DupAcksOut: 0
StartTimeUsec: 903832
Duration: 10062211
c2sData: 9
c2sAck: 9
s2cData: 9
s2cAck: 9
half_duplex: 0
link: 100
congestion: 0
bad_cable: 0
mismatch: 0
spd: -524.91
bw: 1970.18
loss: 0.000025662
avgrtt: 6.85
waitsec: 0.00
timesec: 10.00
order: 0.0300
rwintime: 0.0061
sendtime: 0.8176
cwndtime: 0.1763
rwin: 95.9492
swin: 128.0000
cwin: 96.2500
rttsec: 0.006849
Sndbuf: 16777216
aspd: 0.00000
CWND-Limited: 115198.00
minCWNDpeak: 8960
maxCWNDpeak: 12615680
CWNDpeaks: 26
[root@M
~]#
-----------------------
T is the client and M is the server (I had restarted M so some of the values
were reset to the default):
-----------------------
[root@T
~]# web100clt -n M.ucsd.edu -lll
Testing network path for configuration and performance problems -- Using
IPv4 address
Checking for Middleboxes . . . . . . . . . . . . . . . . . . Done
checking for firewalls . . . . . . . . . . . . . . . . . . . Done
running 10s outbound test (client to server) . . . . . 9687.49 Mb/s
running 10s inbound test (server to client) . . . . . . 1919.56 Mb/s
The slowest link in the end-to-end path is a 10 Gbps 10 Gigabit
Ethernet/OC-192 subnet
Information [S2C]: Packet queuing detected: 78.17% (local buffers)
Server 'M.ucsd.edu' is not behind a firewall. [Connection to the ephemeral
port was successful]
Client is not behind a firewall. [Connection to the ephemeral port was
successful]
------ Web100 Detailed Analysis ------
Web100 reports the Round trip time = 0.79 msec;the Packet size = 8960 Bytes;
and
No packet loss was observed.
This connection is sender limited 99.62% of the time.
Web100 reports TCP negotiated the optional Performance Settings to:
RFC 2018 Selective Acknowledgment: ON
RFC 896 Nagle Algorithm: ON
RFC 3168 Explicit Congestion Notification: OFF
RFC 1323 Time Stamping: OFF
RFC 1323 Window Scaling: ON; Scaling Factors - Server=9, Client=9
The theoretical network limit is 8658517.00 Mbps
The NDT server has a 16384 KByte buffer which limits the throughput to
162025.31 Mbps
Your PC/Workstation has a 12288 KByte buffer which limits the throughput to
121518.98 Mbps
The network based flow control limits the throughput to 121835.44 Mbps
Client Data reports link is ' 9', Client Acks report link is ' 9'
Server Data reports link is ' 9', Server Acks report link is ' 9'
Packet size is preserved End-to-End
Information: Network Address Translation (NAT) box is modifying the Server's
IP address
Server says [<IP2>] but Client says [ M.ucsd.edu]
Information: Network Address Translation (NAT) box is modifying the Client's
IP address
Server says [<IP1>] but Client says [ T]
CurMSS: 8960
X_Rcvbuf: 87380
X_Sndbuf: 16777216
AckPktsIn: 224114
AckPktsOut: 0
BytesRetrans: 0
CongAvoid: 0
CongestionOverCount: 0
CongestionSignals: 0
CountRTT: 224114
CurCwnd: 12615680
CurRTO: 201
CurRwinRcvd: 12558848
CurRwinSent: 17920
CurSsthresh: -256
DSACKDups: 0
DataBytesIn: 0
DataBytesOut: -1871234080
DataPktsIn: 0
DataPktsOut: 1237850
DupAcksIn: 0
ECNEnabled: 0
FastRetran: 0
MaxCwnd: 12615680
MaxMSS: 8960
MaxRTO: 211
MaxRTT: 11
MaxRwinRcvd: 12582912
MaxRwinSent: 17920
MaxSsthresh: 0
MinMSS: 8960
MinRTO: 201
MinRTT: 0
MinRwinRcvd: 17920
MinRwinSent: 17920
NagleEnabled: 1
OtherReductions: 0
PktsIn: 224114
PktsOut: 1237850
PktsRetrans: 0
RcvWinScale: 9
SACKEnabled: 3
SACKsRcvd: 0
SendStall: 0
SlowStart: 0
SampleRTT: 0
SmoothedRTT: 1
SndWinScale: 9
SndLimTimeRwin: 29301
SndLimTimeCwnd: 8996
SndLimTimeSender: 10054035
SndLimTransRwin: 602
SndLimTransCwnd: 99
SndLimTransSender: 701
SndLimBytesRwin: 49860420
SndLimBytesCwnd: 14403440
SndLimBytesSender: -1935497940
SubsequentTimeouts: 0
SumRTT: 176939
Timeouts: 0
TimestampsEnabled: 0
WinScaleRcvd: 9
WinScaleSent: 9
DupAcksOut: 0
StartTimeUsec: 614339
Duration: 10094355
c2sData: 9
c2sAck: 9
s2cData: 9
s2cAck: 9
half_duplex: 0
link: 100
congestion: 0
bad_cable: 0
mismatch: 0
spd: -1483.29
bw: 8658516.76
loss: 0.000000000
avgrtt: 0.79
waitsec: 0.00
timesec: 10.00
order: 0.0000
rwintime: 0.0029
sendtime: 0.9962
cwndtime: 0.0009
rwin: 96.0000
swin: 128.0000
cwin: 96.2500
rttsec: 0.000790
Sndbuf: 16777216
aspd: 0.00000
CWND-Limited: 121894.00
minCWNDpeak: -1
maxCWNDpeak: -1
CWNDpeaks: -1
[root@T
~]#
-------------------------------
thanks!
Nathalie~
-----Original Message-----
From: Rich Carlson
[mailto:]
Sent: Thursday, April 15, 2010 11:17 AM
To: Gholmieh, Nathalie
Cc:
''
Subject: Re: bandwidth asymmetry on a 10G link
Hi Nathalie;
I'm not sure I understand the configuration and the problem so let me
ask for clarification.
You have 2 hosts connected back-to-back with a cross-over cable (fiber
or copper?) You have installed an NDT server on both nodes and from
either node you get asymmetric results as shown below. If this is not
correct, then please clarify.
A couple of questions.
1) What Linux kernel version are you using?
2) what Myircom driver version are you using?
3) have you tuned any of the Myircom parameters?
more comments in-line
On 4/15/2010 1:27 PM, Gholmieh, Nathalie wrote:
> Hi-
>
> I have setup 2 NDT servers interconnected with a 10G link, both using
> Myricom 10G NICs, on our local network. the two servers have the same
> versions of NDT 3.6.1.
>
> when running NDT tests between the 2 servers, I get a C2S bandwidth of
> approximately 10Gbps, but the S2C bandwidth is not exceeding 3 Gbps, and
> that is on BOTH machines:
>
> [root@M
> ~]# web100clt -n T -4 -l
>
> Testing network path for configuration and performance problems -- Using
> IPv4 address
>
> Checking for Middleboxes . . . . . . . . . . . . . . . . . . Done
>
> checking for firewalls . . . . . . . . . . . . . . . . . . . Done
>
> *running 10s outbound test (client to server) . . . . . 9351.25 Mb/s*
>
> *running 10s inbound test (server to client) . . . . . . 2605.19 Mb/s*
If the results in 1 direction showed these rates and a test in the
opposite direction showed an inverted state (c2s lower than s2c) then I
would suspect a problem with pacing or flow control in 1 direction or a
configuration problem on 1 node. However, if both nodes report the same
results (c2s is always greater than s2c) then (1) it is a problem with
my code in the xmit loop; or (2) an unknown problem.
> The slowest link in the end-to-end path is a 10 Gbps 10 Gigabit
> Ethernet/OC-192 subnet
>
> *Information [S2C]: Packet queuing detected: 72.52% (local buffers)*
>
> Server 'T' is not behind a firewall. [Connection to the ephemeral port
> was successful]
>
> Client is not behind a firewall. [Connection to the ephemeral port was
> successful]
>
> ------ Web100 Detailed Analysis ------
>
> Web100 reports the Round trip time = 4.09 msec;the Packet size = 8960
> Bytes; and
The RTT includes host queuing time (~5.6 MB in queue) using jumbo
frames. What is the txqueuelen value for this interface (ifconfig command)?
> There were 337 packets retransmitted, 9749 duplicate acks received, and
> 10089 SACK blocks received
>
> Packets arrived out-of-order 4.32% of the time.
Packets are being reordered. This is probably due to the pkt processing
by multiple cores.
> The connection stalled 1 times due to packet loss.
>
> The connection was idle 0.20 seconds (2.00%) of the time.
The sending node went through at least 1 timeout. Add a 2nd -l to the
command line and look at the last 3 variables that get reported (*CWND*)
this will tell you the number of times TCP invoked the CA algorithm and
what the high and low watermarks were.
> This connection is receiver limited 2.41% of the time.
>
> This connection is sender limited 76.06% of the time.
This is saying that the sender has limited resources, probably
txqueuelen limits that prevent it from sending more data. Note with
jumbo frames 4 msec is about 5.6 MB and 624 packets.
> This connection is network limited 21.54% of the time.
>
> Web100 reports TCP negotiated the optional Performance Settings to:
>
> RFC 2018 Selective Acknowledgment: ON
>
> RFC 896 Nagle Algorithm: ON
>
> RFC 3168 Explicit Congestion Notification: OFF
>
> RFC 1323 Time Stamping: OFF
>
> RFC 1323 Window Scaling: ON; Scaling Factors - Server=9, Client=9
>
> The theoretical network limit is 2148.73 Mbps
This is from the Mathis equation ((pkt-size)/(rtt*sqrt(loss))). This is
about the same as the measured rate so this is the limiting factor.
> The NDT server has a 16384 KByte buffer which limits the throughput to
> 31295.84 Mbps
>
> Your PC/Workstation has a 12282 KByte buffer which limits the throughput
> to 23459.46 Mbps
>
> The network based flow control limits the throughput to 23533.01 Mbps
Buffer space from, your tuning parms below, is adequate.
> Client Data reports link is ' 9', Client Acks report link is ' 9'
>
> Server Data reports link is ' 9', Server Acks report link is ' 9'
>
> Packet size is preserved End-to-End
>
> Information: Network Address Translation (NAT) box is modifying the
> Server's IP address
>
> Server says [<IP>] but Client says [ T]
>
> Information: Network Address Translation (NAT) box is modifying the
> Client's IP address
>
> Server says [<IP2>] but Client says [M]
>
> [root@M
> ~]#
>
> I have these sysctl values set on both servers:
>
> net.core.rmem_max = 16777216
>
> net.core.wmem_max = 16777216
>
> net.ipv4.tcp_wmem = 4096 65536 16777216
>
> net.ipv4.tcp_rmem = 4096 87380 16777216
>
> net.core.netdev_max_backlog = 250000
>
> net.ipv4.tcp_no_metrics_save = 1
Run the ifconfig command and report the txqueuelen value.
> I have also noticed that same asymmetry in the bandwidth while
> transferring an FTP file back and forth on the same link between the two
> servers.
>
> Note that the traffic both ways is using the same path.
>
> I am wondering why there is a difference between the sent and received
> bandwidth, and what parameters I should tune to use the full 10G both ways.
>
> Any ideas are very appreciated.
>
> Thanks!
I don't have a good clue right now. Check the things listed
(txqueuelen, version info, NIC tuning) and also run with more logging
(-ll instead of -l). Turning on flow control may help. Also consider
running an NPAD test. The NPAD system probes for pkt queues and other
system configuration settings and it may point out more details that can
help you understand what is going on here.
Rich
> Nathalie~//
>
- bandwidth asymmetry on a 10G link, Gholmieh, Nathalie, 04/15/2010
- Re: bandwidth asymmetry on a 10G link, Rich Carlson, 04/15/2010
- RE: bandwidth asymmetry on a 10G link, Gholmieh, Nathalie, 04/19/2010
- Re: bandwidth asymmetry on a 10G link, Scott Bertilson, 04/19/2010
- Message not available
- RE: bandwidth asymmetry on a 10G link, Gholmieh, Nathalie, 04/21/2010
- Re: bandwidth asymmetry on a 10G link, Rich Carlson, 04/21/2010
- RE: bandwidth asymmetry on a 10G link, Gholmieh, Nathalie, 04/21/2010
- RE: bandwidth asymmetry on a 10G link, Gholmieh, Nathalie, 04/19/2010
- Re: bandwidth asymmetry on a 10G link, Rich Carlson, 04/15/2010
Archive powered by MHonArc 2.6.16.