perfsonar-user - [perfsonar-user] iperf3 performance tuning for small packets
Subject: perfSONAR User Q&A and Other Discussion
List archive
- From: Casey Russell <>
- To: "" <>
- Subject: [perfsonar-user] iperf3 performance tuning for small packets
- Date: Mon, 13 Feb 2017 18:50:00 -0600
- Ironport-phdr: 9a23:CX6c3xJGuwVqXlAZ29mcpTZWNBhigK39O0sv0rFitYgfKvrxwZ3uMQTl6Ol3ixeRBMOAuq4C2rad6PmocFdDyK7JiGoFfp1IWk1NouQttCtkPvS4D1bmJuXhdS0wEZcKflZk+3amLRodQ56mNBXdrXKo8DEdBAj0OxZrKeTpAI7SiNm82/yv95HJbQhFgDWwbal2IRi5ogncuNQaipZ+J6gszRfEvmFGcPlMy2NyIlKTkRf85sOu85Nm7i9dpfEv+dNeXKvjZ6g3QqBWAzogM2Au+c3krgLDQheV5nsdSWoZjBxFCBXY4R7gX5fxtiz6tvdh2CSfIMb7Q6w4VSik4qx2ThLjlSUJOCMj8GzPisJ+gqFVrg+/qRNj2IPbep2ZOeBkc6/BYd8XR2xMVdtRWSxbBYO8apMCAvQbMuZDsYb8p0YFowejBQm3H+Pg1DtIhnvr1qA9z+shCxzG3Bc7H90QtHTUqM74NKQIXuCz1qnH0zPDY+lQ2Tjj9IjFaxYsquyCU7J3dMre00gvFwXdg1qOsozpJTWV2foRs2SD6+pgVOSvi205pA5sozivwN0ghZXOhoIQ0lzE6St4wIcpJd2kVkF7e8SoH4dXtyGfL4d2QdktQ2dyuCkkzL0Ko5i7czYWyJkh2hXRaOSHfpCW7h/iSOqcIzJ1hHxmdb2kmxq/9EetxvHgWsao1VtFsjZJn9nJu30IyRDf9MaHR/1g9Um7wzmPzRrc6uRcLEA0i6XbL5khz6Y1lpUJsETDGjb6mFv1jKOKb0kl9fak5/rpYrn8qZ+cMIh0ig76MqswgMCwHeM4Mg0WU2ia/+SzyqHj8FXnTLhLkvE7kKzUsJ7ZKMsAuqK0BgBY3po/5xmjCjqpzMgUkHkCIV9AZh6LkYbpN0nLIP/iDPe/h1qskC1sx/DDJrDuHo/NLmXYkLf6Y7l970pcxREwzd9F4ZJUEK8OIPTpVk/3qtPYEgc1MxaozOb/FNV9yoQeVHqXAqCHKqPSv0SI5uUpI+aWfo8ZoSv9K+M76P70lnI5nV4dfbK13ZsMdny0BPVmI0OFYXXymNcBF3kFvhYgQODwllKNTCNTND6OWPd2/jwhBpmhC47ZA522jaap3SGnE4dQa3wcTF2ADD2gI52JQfkXbySbOIp8iTEefbmnV4I70xyy7kn3x6cxfcTO/ShNnpv42ck93fDIjhw28XQgBN6AyHqAS2VcnWoOXTIwmqZyvRoumR+4zaFkjqkARpRo7PRTX1JiOA==
Group,
I have recently needed to stress test some new firewalls and intended to use iperf3 to do it. I'm pushing iperf3 harder than I've personally tried to push it before and running into some challenges. I know this isn't an iperf3 list, specifically, but many of you will have used the tool pretty extensively, so if you'll indulge me...
Interestingly (at least to me) I'm having trouble generating a full 1G of traffic in a "worst case" scenario through the firewalls, or even directly box to box. By worst case, I mean at 64byte (or near 64byte) udp packets. Fortunately (or unfortunately, depending on your love for IPsec) the circuit between the firewalls is going to be an IPSEC tunnel, so there is a massive amount of protocol overhead at the small packet sizes. I only need to generate something less than 500Mbits/s of iperf3 traffic to fill that pipe.
The servers I'm using are a few years old, but I didn't expect to have as much trouble as I am getting near gig speeds. When I push the machines at all, one of two things happens.
If I keep the setup simple, iperf3 simply refuses to push the requested bandwidth. For instance, what should have been an 800Mb/s test runs at 240Mb/s with no, or very little loss.
Because I can see, in that scenario, that a single CPU core is getting hammered, I create multiple send and receiver processes on both ends and use the -A flag to assign each a different CPU affinity. This results in achieving a bit more bandwidth (500-600Mb/s) but massive loss (20-40% or more). In particular, I notice that the loss begins in spades anytime I'm running multiple sender (or receiver) processes using the same physical socket (even if it's different logical processor cores).
As an example, I've experimented with running as many as 6 servers and 6 clients on each physical host. I used the -A flag to set CPU affinity since each host has 16+ cores. I'm using the -w flag to increase the receive buffer to 1M or larger. I'm using Zerocopy to reduce CPU load as much as I can
I also experimented with fewer send/receive processes with the -P flag set to send more parallel streams per process. However, I'm finding it near impossible to get more than about 600Mb/s between the two hosts even with them connected back to back via a Cat6 cable.
My question for the group is: Does this sound like a "meh, that sounds about right" scenario, or should I definitely be able to squeeze more performance out of these boxes and I'm just missing a tuning option somewhere? I've followed the ES.net tuning guides here the Centos 6 host: https://fasterdata.es.net/host-tuning/linux/
I realize that's a brief overview, but I don't want to drown you with an even bigger wall of text. I'll be happy to provide more specifics if requested either on or offline You'll find an example run below as well as my system specs.
My system specs are as follows:
Host1 (sender) (pci express NIC)
CentOS release 6.8
Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
2 physical CPUs (sockets), 4 cores/socket, 2 threads/core (16 CPUs)
126G RAM (DDR3 1600)
Intel Corporation 82576 Gigabit Network Connection (rev 01)
igb 0000:05:00.0: Intel(R) Gigabit Ethernet Network Connection
igb 0000:05:00.0: eth0: (PCIe:2.5Gb/s:Width x4) 00:1b:21:8e:63:08
Host2 (receiver) (onboard NIC)
CentOS Linux release 7.3.1611
Intel(R) Xeon(R) CPU E5645 @ 2.40GHz
2 physical CPUs (sockets), 6 cores/socket, 2 threads/core (24 CPUs)
12G RAM (DDR3 1333)
Broadcom Limited NetXtreme II BCM5716 Gigabit Ethernet (rev 20)
bnx2: QLogic bnx2 Gigabit Ethernet Driver v2.2.6 (January 29, 2014)
bnx2 0000:01:00.0 eth0: Broadcom NetXtreme II BCM5716 1000Base-T (C0) PCI Express found ....
TEST RUN
You'll notice I requested 90Mb/s per stream, ran 4 streams (360Mb/s total), but only got a total of 224Mb/s. I'm certain that's because CPU 7 is railed on both since I only have a single process on each box for this test.
[sender]
Cpu7 : 11.0%us, 89.0%sy,
[receiver]
Cpu7 : 11.4%us, 88.2%sy,
[crussell@localhost ~]$ iperf3 -i 10 -u -b 90M -l 64 -t 30 -Z -P 4 -w 1M -A 7,7 -p 5195 -c 10.18.49.10
Connecting to host 10.18.49.10, port 5195
[ 4] local 10.18.48.10 port 41757 connected to 10.18.49.10 port 5195
[ 6] local 10.18.48.10 port 56923 connected to 10.18.49.10 port 5195
[ 8] local 10.18.48.10 port 38524 connected to 10.18.49.10 port 5195
[ 10] local 10.18.48.10 port 58086 connected to 10.18.49.10 port 5195
[ ID] Interval Transfer Bandwidth Total Datagrams
(Middle redacted for Brevity)
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[ 4] 0.00-30.00 sec 200 MBytes 55.9 Mbits/sec 0.004 ms 0/3275660 (0%)
[ 4] Sent 3275660 datagrams
[ 6] 0.00-30.00 sec 200 MBytes 55.9 Mbits/sec 0.003 ms 0/3275660 (0%)
[ 6] Sent 3275660 datagrams
[ 8] 0.00-30.00 sec 200 MBytes 55.9 Mbits/sec 0.003 ms 0/3275660 (0%)
[ 8] Sent 3275660 datagrams
[ 10] 0.00-30.00 sec 200 MBytes 55.9 Mbits/sec 0.004 ms 0/3275660 (0%)
[ 10] Sent 3275660 datagrams
[SUM] 0.00-30.00 sec 800 MBytes 224 Mbits/sec 0.003 ms 0/13102640 (0%)
- [perfsonar-user] iperf3 performance tuning for small packets, Casey Russell, 02/14/2017
- Re: [perfsonar-user] iperf3 performance tuning for small packets, Brian Tierney, 02/14/2017
- Re: [perfsonar-user] iperf3 performance tuning for small packets, Eli Dart, 02/14/2017
- [perfsonar-user] Re: iperf3 performance tuning for small packets, Casey Russell, 02/15/2017
Archive powered by MHonArc 2.6.19.