Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] iperf not usable from perfsonar box, works from CentOS non-PS box

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] iperf not usable from perfsonar box, works from CentOS non-PS box


Chronological Thread 
  • From: Matthew J Zekauskas <>
  • To: Trey Dockendorf <>
  • Cc: perfsonar-user <>
  • Subject: Re: [perfsonar-user] iperf not usable from perfsonar box, works from CentOS non-PS box
  • Date: Tue, 27 Jan 2015 14:58:43 -0500
  • Authentication-results: internet2.edu; dkim=none (message not signed) header.d=none;internet2.edu; dmarc=none action=none header.from=internet2.edu;

[popping up head out of sand momentarily]

Without having read the whole thread in detail, this feels like an MTU black hole.  If both ends think that the MTU is 9000, and successfully negotiate a large MSS, and then actually try to use it, but some element in the middle drops large MTU packets....  then a session will start, but then hang as the large packet sent and repeatedly retransmitted but dropped.

The test Azher suggests (ping with large packets) is an easy way to see if large packets make it through successfully.  It should be sufficient to do something large but not full sized (e.g. 8192) to see.  However, if there is a tunnel in the middle, then the tunnel can also push a full size packet over the MTU, so if the large packet succeeds it probably still pays to try to do the math (or just test using some sort of search strategy) to send a full size packet and see if it makes it through.

I suppose using something like tracepath could also show this (although if there is a black hole then the trace would just fail in the middle somewhere instead of adjusting MTU lower).

--Matt
[back to sand]

On 1/27/15 2:35 PM, Trey Dockendorf wrote:
Both ends are 9000 MTU

[root@psonar-bwctl ~]# ip link show p1p1
6: p1p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP qlen 1000
    link/ether 90:e2:ba:2e:eb:50 brd ff:ff:ff:ff:ff:ff

[root@psonar-owamp ~]# ip link show p1p1
6: p1p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP qlen 1000
    link/ether 90:e2:ba:2e:ea:04 brd ff:ff:ff:ff:ff:ff

As a test case this afternoon I will be moving both these systems off my local Force10 switch that links to Science DMZ to be directly connected to equipment for our Science DMZ.  This will at least allow our local networking experts to better assist in debugging the problem and rule out a misconfiguration on my local Force10 switch.

Thanks,
- Trey

=============================

Trey Dockendorf 
Systems Analyst I 
Texas A&M University 
Academy for Advanced Telecommunications and Learning Technologies 
Phone: (979)458-2396 
Email:  
Jabber:

On Tue, Jan 27, 2015 at 1:29 PM, Shawn McKee <> wrote:
Is there an MTU mismatch between the hosts?   

Sender at 9000 and receiver at 1500 and PMTU fails? Initial negotiation will use packets < 1500, but data would by >1500 and if fragmentation is not allowed packets are dropped.

Just a thought since you said 'The transfer started then scp reports "stalled".'

Shawn

On Tue, Jan 27, 2015 at 2:15 PM, Eli Dart <> wrote:
Hi Trey,

If you have root on the suspect box, run tcpdump during a test that fails and see what's going on.

Measurement tools are wonderful and helpful and valuable, but if things are busted enough that the tools can't run, sometimes you just have to watch the packets to figure out what's going on....

Eli



On Tue, Jan 27, 2015 at 11:01 AM, Trey Dockendorf <> wrote:
Transferring a 4.3GB file fails...very bizarre.  The transfer started then scp reports "stalled".

The failure is between the 2 perfsonar boxes.  Transferring to a perfsonar box from a host on our campus LAN and not on our science DMZ works at the expected 1Gbps rate.

Transferring from another science DMZ host (stock CentOS) to the perfsonar box fails.

Transferring from science DMZ to science DMZ, both stock CentOS boxes, works.  So it's only the interactions with perfsonar host that fail.

I'm suspecting something local is broken and just so happens it's effecting interactions to/from my perfsonar boxes.

If there's any suggestions on how to debug I'd be glad to hear them, but seems like something is broken on our local network.

Thanks,
- Trey

=============================

Trey Dockendorf 
Systems Analyst I 
Texas A&M University 
Academy for Advanced Telecommunications and Learning Technologies 
Phone: (979)458-2396 
Email:  
Jabber:

On Tue, Jan 27, 2015 at 12:46 PM, Aaron Brown <> wrote:
Hey Trey,

That is bizarre. Could you try scp’ing a large file between the two hosts?

Cheers,
Aaron

On Jan 27, 2015, at 1:00 PM, Trey Dockendorf <> wrote:

Yes, it occurs in both directions between psonar-bwctl.brazos.tamu.edu and psonar-owamp.brazos.tamu.edu.

If I run the server side on either of those boxes and try and run the client from a plain CentOS host the tests also don't work.

These systems are both stock net installs of PS 3.4

- Trey

=============================

Trey Dockendorf 
Systems Analyst I 
Texas A&M University 
Academy for Advanced Telecommunications and Learning Technologies 
Phone: (979)458-2396 
Email:  
Jabber:

On Tue, Jan 27, 2015 at 8:45 AM, Aaron Brown <> wrote:
Hey Trey,

Does this happen in both directions?

Cheers,
Aaron

On Jan 26, 2015, at 7:46 PM, Trey Dockendorf <> wrote:

As the remote end was not setup by me I can't test this particular host, but instead I've tried iperf between my latency PS host and bandwidth PS host and got same results.  Below are results using iperf3 and nuttcp.  With iperf the connection seemed to stall at first then produce 0 bits/sec messages, but with iperf3 it rapidly printed the interval lines.

psonar-bwctl:
# iperf3 -p 5001 -s

psonar-owamp:
# iperf3 -c psonar-bwctl.brazos.tamu.edu -p 5001 -i 1
Connecting to host psonar-bwctl.brazos.tamu.edu, port 5001
[  4] local 165.91.55.4 port 60629 connected to 165.91.55.6 port 5001
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  87.4 KBytes   715 Kbits/sec    2   26.2 KBytes
[  4]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec    1   26.2 KBytes
[  4]   2.00-3.00   sec  0.00 Bytes  0.00 bits/sec    0   26.2 KBytes
[  4]   3.00-4.00   sec  0.00 Bytes  0.00 bits/sec    1   26.2 KBytes
[  4]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec    0   26.2 KBytes
[  4]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec    0   26.2 KBytes
[  4]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec    1   26.2 KBytes
[  4]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec    0   26.2 KBytes
[  4]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec    0   26.2 KBytes
[  4]   9.00-10.00  sec  0.00 Bytes  0.00 bits/sec    0   26.2 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  87.4 KBytes  71.6 Kbits/sec    5             sender
[  4]   0.00-10.00  sec  0.00 Bytes  0.00 bits/sec                  receiver


Couldn't quite make nuttcp work:

psonar-bwctl:
# nuttcp -1

psonar-owamp:
nuttcp_mread: Bad file descriptor

=============================

Trey Dockendorf 
Systems Analyst I 
Texas A&M University 
Academy for Advanced Telecommunications and Learning Technologies 
Phone: (979)458-2396 
Email:  
Jabber:

On Mon, Jan 26, 2015 at 6:33 PM, Brian Tierney <> wrote:

I'm not aware of anything that might explain that. Can you try iperf3 and nuttcp to see if they behave the same?



On Mon, Jan 26, 2015 at 12:22 PM, Trey Dockendorf <> wrote:
Right now most of the endpoints I'm testing against seem to not work with bwctl so to help some colleagues I've been trying to just use iperf by itself.  From my perfsonar boxes the iperf tests seem to do nothing while a non-perfsonar host with iperf installed from EPEL gives expected output.  The results are below.  I've removed the remote information as these were not against a perfsonar box but a remote site's cluster login node.

Is there something about iperf on a PS host that could cause this issue?  The 2 systems below are on same network and same core switch.

PERFSONAR:
# iperf -c <REMOTE HOST> -p 50100 -t 20 -i 1
------------------------------------------------------------
Client connecting to <REMOTE HOST>, TCP port 50100
TCP window size: 92.6 KByte (default)
------------------------------------------------------------
[  3] local 165.91.55.6 port 33135 connected with <REMOTE IP> port 50100
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec   175 KBytes  1.43 Mbits/sec
[  3]  1.0- 2.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  2.0- 3.0 sec  0.00 Bytes  0.00 bits/sec
<Repeated 100s of times with 0.00 bits/sec>
[  3] 931.0-932.0 sec  0.00 Bytes  0.00 bits/sec

NON-PERFSONAR:

$ iperf -c <REMOTE HOST> -p 50100 -t 20 -i 1
------------------------------------------------------------
Client connecting to <REMOTE HOST>, TCP port 50100
TCP window size: 92.6 KByte (default)
------------------------------------------------------------
[  3] local 165.91.55.28 port 51252 connected with <REMOTE IP> port 50100
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec  60.5 MBytes   508 Mbits/sec
[  3]  1.0- 2.0 sec  30.5 MBytes   256 Mbits/sec
[  3]  2.0- 3.0 sec  32.9 MBytes   276 Mbits/sec
[  3]  3.0- 4.0 sec  36.2 MBytes   304 Mbits/sec
[  3]  4.0- 5.0 sec  36.6 MBytes   307 Mbits/sec
[  3]  5.0- 6.0 sec  25.1 MBytes   211 Mbits/sec
[  3]  6.0- 7.0 sec  27.2 MBytes   229 Mbits/sec
[  3]  7.0- 8.0 sec  33.5 MBytes   281 Mbits/sec
[  3]  8.0- 9.0 sec  32.2 MBytes   271 Mbits/sec
[  3]  9.0-10.0 sec  31.9 MBytes   267 Mbits/sec
[  3] 10.0-11.0 sec  24.9 MBytes   209 Mbits/sec
[  3] 11.0-12.0 sec  29.9 MBytes   251 Mbits/sec
[  3] 12.0-13.0 sec  36.8 MBytes   308 Mbits/sec
[  3] 13.0-14.0 sec  33.0 MBytes   277 Mbits/sec
[  3] 14.0-15.0 sec  21.6 MBytes   181 Mbits/sec
[  3] 15.0-16.0 sec  16.6 MBytes   139 Mbits/sec
[  3] 16.0-17.0 sec  22.0 MBytes   185 Mbits/sec
[  3] 17.0-18.0 sec  23.6 MBytes   198 Mbits/sec
[  3] 18.0-19.0 sec  23.5 MBytes   197 Mbits/sec
[  3] 19.0-20.0 sec  19.2 MBytes   161 Mbits/sec
[  3]  0.0-20.0 sec   598 MBytes   250 Mbits/sec

Thanks,
- Trey

=============================

Trey Dockendorf 
Systems Analyst I 
Texas A&M University 
Academy for Advanced Telecommunications and Learning Technologies 
Email:  
Jabber:



--
Brian Tierney, http://www.es.net/tierney
Energy Sciences Network (ESnet), Berkeley National Lab
http://fasterdata.es.net









--
Eli Dart, Network Engineer                          NOC: (510) 486-7600
ESnet Office of the CTO (AS293)                          (800) 333-7638
Lawrence Berkeley National Laboratory 
PGP Key fingerprint = C970 F8D3 CFDD 8FFF 5486 343A 2D31 4478 5F82 B2B3






Archive powered by MHonArc 2.6.16.

Top of Page