Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] 10gbits perfsonar: how to interpret this iperf3 result between 10gbits hosts ?

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] 10gbits perfsonar: how to interpret this iperf3 result between 10gbits hosts ?


Chronological Thread 
  • From: Jason Zurawski <>
  • To: SCHAER Frederic <>
  • Cc: perfsonar-user <>
  • Subject: Re: [perfsonar-user] 10gbits perfsonar: how to interpret this iperf3 result between 10gbits hosts ?
  • Date: Wed, 05 Oct 2016 08:36:50 -0600
  • Ironport-phdr: 9a23:tHkkmxMVXoIH/bxCGmwl6mtUPXoX/o7sNwtQ0KIMzox0KPn+rarrMEGX3/hxlliBBdydsK0UzbeN+Pm9EUU7or+/81k6OKRWUBEEjchE1ycBO+WiTXPBEfjxciYhF95DXlI2t1uyMExSBdqsLwaK+i76xXcoFx7+LQt4IPjuUs6X1pzvlrP6x5qGWC5orReHKZ1oIRm7twTa/p0Ym4ZuJ7wwwV3SqXtBdv5+zm5hY1yJyUXG69+07aJkpjxdofw68MhJS+DnZKkiBehAAS4oKGcz7deuqALOVyOO4GcRSGMbjkAODgTYukLURJD05wnzre17kBuHJ8TpVrM1EWCw8r1iVwTriQ8KLHg//X2B2Z84t75SvB/0/083+IXTeozAcaMmJq4=
  • Ironport-phdr: 9a23:h8w42xflCtrZJ0PVaE5qRzMllGMj4u6mDksu8pMizoh2WeGdxc+4Yh7h7PlgxGXEQZ/co6odzbGH6ea7ASdfvN6oizMrSNR0TRgLiMEbzUQLIfWuLgnFFsPsdDEwB89YVVVorDmROElRH9viNRWJ+iXhpQAbFhi3DwdpPOO9QteU1JXtkbvqsMKOKyxzxxOFKYtoKxu3qQiD/uI3uqBFbpgL9x3Sv3FTcP5Xz247bXianhL7+9vitMU7q3cYk7sb+sVBSaT3ebgjBfwdVWx+cjN92Mq+jjLjZCa1rlUGX2kbiBtDS1zL9hz2U43wuW3hvep01TOyNsD/C74uD2eY4r9vWSPv3T8KLTAi92fekIltl69B6Ea5qgZx2InSaZvQKeFzZIvce88XX2xMQpwXWiBcVNCSdYwKWsgIJuFe57vgvVIRthi/TV2+Gfnm1SRLh1f7xus83vh3QlKO5xApA99b6Cecl97yLqpHFLntlKQ=

Greetings Frederic;

One point of order w/ iperf3 - note that it is single threaded and the use of the '-p' flag could result in being CPU bound.  There is some guidance on this page about that (look under "iperf3 thread model"):

https://fasterdata.es.net/performance-testing/network-troubleshooting-tools/iperf-and-iperf3/

That could be causing issues by itself - also 30 parallel streams could cause significant 'stepping' on each other. 

Thanks;

-jason

SCHAER Frederic wrote:

Hi,

 

I’m trying to determine if we really can use all the available bandwidth on our paths (and if the v6 bandwidth is equivalent to that of the v4, but nevermind).

I tried to run some transfers between my site and a few others (and I tried 3rd party transfers too), using the LHCONE network.

 

According to the network traffic graphs here, the links are far from being overloaded. of the 20gbits/s available, 6 to 8gbits/s are used.

My own connection is 10gbits/s only.

 

I setup a perfsonar host with a 10gbits/s network card, plug that on a force10  switch, which is directly connected using 2x40gbits/s links to the main switch itself connected to the (dedicated) router.

So, that’s : PERFSONAR 10Gbits => SWITCH => 80Gbits => SWITCH => 10Gbits/s => Router => LHCONE+internet

 

With this setup, and with a relatively free network , I usually cannot reach more than 3 or 4gbits/s with a bwctl/iperf3 test, using even as many as 80 parallel transfers. With a single transfer, bandwidth can be as low as 700mbits/s, and I’m seeing TCP retransmits in all cases.

A summary of this is the iperf3 output :

 

[ 71]   0.00-30.00  sec   504 MBytes   141 Mbits/sec  334             sender

[ 71]   0.00-30.00  sec   504 MBytes   141 Mbits/sec                  receiver

[ 73]   0.00-30.00  sec   514 MBytes   144 Mbits/sec  373             sender

[ 73]   0.00-30.00  sec   513 MBytes   144 Mbits/sec                  receiver

[SUM]   0.00-30.00  sec  16.0 GBytes  4570 Mbits/sec  11105             sender

[SUM]   0.00-30.00  sec  15.9 GBytes  4566 Mbits/sec                  receiver

CPU Utilization: local/sender 85.2% (4.3%u/80.9%s), remote/receiver 71.1% (2.9%u/68.1%s)

 

As you can see there were 11K+ retransmits during the 30s transfer.

The command was:

bwctl -4 -v -r -s <source> -c <destination>  -t 30 -i 1 -T iperf3 -P 30

(in that case, the source was my host)

 

I’m therefore wondering where I could possibly be wrong ?

I tried to optimize the kernel parameters according to the ESnet tuning guides, but this did not change much.

 

The destination host seems quite close thanks to LHCONE :

rtt min/avg/max/mdev = 6.361/6.384/6.409/0.015 ms

 

The sysctl params are :

net.core.rmem_max=134217728

net.core.wmem_max=134217728

net.ipv4.tcp_rmem=4096  87380   67108864

net.ipv4.tcp_wmem=4096  65536   67108864

net.core.netdev_max_backlog=250000

net.ipv4.tcp_no_metrics_save=1

net.ipv4.tcp_congestion_control=htcp

net.ipv4.conf.all.arp_ignore=1

net.ipv4.conf.all.arp_announce=2

net.ipv4.conf.default.arp_filter=1

net.ipv4.conf.all.arp_filter=1

net.ipv4.tcp_max_syn_backlog=30000

net.ipv4.conf.all.accept_redirects=0

net.ipv4.udp_rmem_min=8192

net.ipv4.tcp_tw_recycle=1

net.core.rmem_default=67108864

net.ipv4.tcp_tw_reuse=1

net.core.optmem_max=134217728

net.ipv4.tcp_slow_start_after_idle=0

net.core.wmem_default=67108864

net.ipv4.conf.all.send_redirects=0

net.ipv4.conf.all.accept_source_route=0

net.ipv4.tcp_mtu_probing=1

net.core.somaxconn=1024

net.ipv4.tcp_max_tw_buckets=2000000

vm.vfs_cache_pressure=1

net.ipv4.tcp_fin_timeout=10

net.ipv4.udp_wmem_min=8192

 

Any idea why the iperf3 transfers do not reach high bandwidth even with a single thread ?

Off course, I don’t know where the destination perfsonar hosts are behind their routers (far or near, behind loaded networks or not…), and I know that the 20gbits/s link is a 2x10, but with that in mind, even a 2 threads transfer should be able to use a lot of bandwidth, not just 4 gbits in the best case ?

Also, why are there TCP retransmits when the links aren’t loaded (according to the network graphs, I don’t have access to the NOC interfaces counters ;) ) ?

 

Ideas ?

 

Regards





Archive powered by MHonArc 2.6.19.

Top of Page