Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] 10gbits perfsonar: how to interpret this iperf3 result bewteen 10gbits hosts ?

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] 10gbits perfsonar: how to interpret this iperf3 result bewteen 10gbits hosts ?


Chronological Thread 
  • From: Shawn McKee <>
  • To: SCHAER Frederic <>
  • Cc: perfsonar-user <>
  • Subject: Re: [perfsonar-user] 10gbits perfsonar: how to interpret this iperf3 result bewteen 10gbits hosts ?
  • Date: Wed, 5 Oct 2016 10:37:40 -0400

Hi Frederic,

Do you have another 10G box in your LAN to test to?   It would be good to verify what your system and card are capable of.   The CPU use shown is a bit high (older system?)

For item 2), this could be happening when traffic steps down from 40G to 10G (the hop after your 2x40G hop).   If you have access to counters on that switch it would be good to check there.

Shawn

On Wed, Oct 5, 2016 at 10:31 AM, SCHAER Frederic <> wrote:

Hi Shawn,

 

Thanks for the suggestions. I’ll try to dig up some owamp commands I might have already run a long time ago ;) (would that be as simple as “owping ps-latency-server” ?)

 

Concerning local LAN issues, I indeed faced some, and I think I got rid of them when I removed flow control on the 40gbits/s links between the switches – at least the local iperf3 stopped reporting TCP retransmit losses on the LAN (last time I looked)...

That flow control thing was apparently causing throttling on the 40gbit links, and unfortunately, the throttling counters aren’t available in the “snmp iftable”, and my counter graphs were remaining at 0… anyway… even after these stopped, the TCP retransmits remained with the “remote” sites.

 

I think we can exclude 5) on our NREN, hopefully (?!)

 

For 1-4, well… wish me luck ;)

 

Fred

 

De : Shawn McKee [mailto:]
Envoyé : mercredi 5 octobre 2016 16:14
À : SCHAER Frederic <fr
>
Cc : perfsonar-user <>
Objet : Re: [perfsonar-user] 10gbits perfsonar: how to interpret this iperf3 result bewteen 10gbits hosts ?

 

Hi Frederic,


It is the packet loses which are killing your performance.  Have you run latency tests along the same path (just to verify we are seeing packet-losses)?

 

As for why you are getting packet-loses (causing your retransmits), I can think of a few reasons:

 

1) Bad cable or dirty fiber along the path

2) "Microbursts" (short-timescale bursts of packets) that cause buffer overflow and packet loss.   These are tricky because they could be happening on timescales much less than your typical measurements on devices (< 1 sec).    

3) Real congestion along the path  (some hop in your end-to-end path has lots of traffic, bringing it close to full...your traffic puts it over the top "sometimes")

4) Mis-configuration of a device or devices along the path

5) Some device on the path shaping, altering the traffic passing through (firewall, NID, shaper/policer, etc)

 

A couple things you can check.  If you have ethtool you can grab stats before and after running iperf3:

 

ethtool -S eth0   > test_start.stats

iperf3 ....

ethtool -S eth0  > test_end.stats

 

Then 'diff test_start.stats test_end.stats'  and look for which counters changed, ensuring nothing unpexected is showing up (packets could be lost on your local system before getting on the net).

 

You can ask iperf3 for finer grained reporting to see if there is any pattern/structure to the retransmits.   

 

Latency tests along the path you are trying as well as along other paths to what level of packet loss you have and if it is correlated with specific paths.

 

I am sure others on the list can provide additional suggestions.  

 

Shawn

 

On Wed, Oct 5, 2016 at 9:56 AM, SCHAER Frederic <> wrote:

Hi,

 

I’m trying to determine if we really can use all the available bandwidth on our paths (and if the v6 bandwidth is equivalent to that of the v4, but nevermind).

I tried to run some transfers between my site and a few others (and I tried 3rd party transfers too), using the LHCONE network.

 

According to the network traffic graphs here, the links are far from being overloaded. of the 20gbits/s available, 6 to 8gbits/s are used.

My own connection is 10gbits/s only.

 

I setup a perfsonar host with a 10gbits/s network card, plug that on a force10  switch, which is directly connected using 2x40gbits/s links to the main switch itself connected to the (dedicated) router.

So, that’s : PERFSONAR 10Gbits => SWITCH => 80Gbits => SWITCH => 10Gbits/s => Router => LHCONE+internet

 

With this setup, and with a relatively free network , I usually cannot reach more than 3 or 4gbits/s with a bwctl/iperf3 test, using even as many as 80 parallel transfers. With a single transfer, bandwidth can be as low as 700mbits/s, and I’m seeing TCP retransmits in all cases.

A summary of this is the iperf3 output :

 

[ 71]   0.00-30.00  sec   504 MBytes   141 Mbits/sec  334             sender

[ 71]   0.00-30.00  sec   504 MBytes   141 Mbits/sec                  receiver

[ 73]   0.00-30.00  sec   514 MBytes   144 Mbits/sec  373             sender

[ 73]   0.00-30.00  sec   513 MBytes   144 Mbits/sec                  receiver

[SUM]   0.00-30.00  sec  16.0 GBytes  4570 Mbits/sec  11105             sender

[SUM]   0.00-30.00  sec  15.9 GBytes  4566 Mbits/sec                  receiver

CPU Utilization: local/sender 85.2% (4.3%u/80.9%s), remote/receiver 71.1% (2.9%u/68.1%s)

 

As you can see there were 11K+ retransmits during the 30s transfer.

The command was:

bwctl -4 -v -r -s <source> -c <destination>  -t 30 -i 1 -T iperf3 -P 30

(in that case, the source was my host)

 

I’m therefore wondering where I could possibly be wrong ?

I tried to optimize the kernel parameters according to the ESnet tuning guides, but this did not change much.

 

The destination host seems quite close thanks to LHCONE :

rtt min/avg/max/mdev = 6.361/6.384/6.409/0.015 ms

 

The sysctl params are :

net.core.rmem_max=134217728

net.core.wmem_max=134217728

net.ipv4.tcp_rmem=4096  87380   67108864

net.ipv4.tcp_wmem=4096  65536   67108864

net.core.netdev_max_backlog=250000

net.ipv4.tcp_no_metrics_save=1

net.ipv4.tcp_congestion_control=htcp

net.ipv4.conf.all.arp_ignore=1

net.ipv4.conf.all.arp_announce=2

net.ipv4.conf.default.arp_filter=1

net.ipv4.conf.all.arp_filter=1

net.ipv4.tcp_max_syn_backlog=30000

net.ipv4.conf.all.accept_redirects=0

net.ipv4.udp_rmem_min=8192

net.ipv4.tcp_tw_recycle=1

net.core.rmem_default=67108864

net.ipv4.tcp_tw_reuse=1

net.core.optmem_max=134217728

net.ipv4.tcp_slow_start_after_idle=0

net.core.wmem_default=67108864

net.ipv4.conf.all.send_redirects=0

net.ipv4.conf.all.accept_source_route=0

net.ipv4.tcp_mtu_probing=1

net.core.somaxconn=1024

net.ipv4.tcp_max_tw_buckets=2000000

vm.vfs_cache_pressure=1

net.ipv4.tcp_fin_timeout=10

net.ipv4.udp_wmem_min=8192

 

Any idea why the iperf3 transfers do not reach high bandwidth even with a single thread ?

Off course, I don’t know where the destination perfsonar hosts are behind their routers (far or near, behind loaded networks or not…), and I know that the 20gbits/s link is a 2x10, but with that in mind, even a 2 threads transfer should be able to use a lot of bandwidth, not just 4 gbits in the best case ?

Also, why are there TCP retransmits when the links aren’t loaded (according to the network graphs, I don’t have access to the NOC interfaces counters ;) ) ?

 

Ideas ?

 

Regards

 





Archive powered by MHonArc 2.6.19.

Top of Page