Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] iperf not usable from perfsonar box, works from CentOS non-PS box

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] iperf not usable from perfsonar box, works from CentOS non-PS box


Chronological Thread 
  • From: Mark Foster <>
  • To: <>
  • Subject: Re: [perfsonar-user] iperf not usable from perfsonar box, works from CentOS non-PS box
  • Date: Tue, 27 Jan 2015 12:14:53 -0800

It's likely that the port(s) on the switch need to be set to handle jumbo's.
The failed mtu test says something in the middle is the problem.

-- Mark

On 1/27/2015 12:12 PM, Trey Dockendorf wrote:
The test suggested by Azher fails.

[root@psonar-bwctl
~]# ping -Mdo -s 8972 165.91.55.4
PING 165.91.55.4 (165.91.55.4) 8972(9000) bytes of data.
<hang>

[root@psonar-bwctl
~]# ping -Mdo -s 1500 165.91.55.4
PING 165.91.55.4 (165.91.55.4) 1500(1528) bytes of data.
1508 bytes from 165.91.55.4 <http://165.91.55.4>: icmp_seq=1 ttl=64
time=0.189 ms

Both these hosts are connected to the same Force10 switch. Is this MTU block
hole something that can be the result of a switch not correctly configured to
handle MTU of 9000?

Thanks,
- Trey

=============================

Trey Dockendorf
Systems Analyst I
Texas A&M University
Academy for Advanced Telecommunications and Learning Technologies
Phone: (979)458-2396
Email:


<mailto:>
Jabber:


<mailto:>

On Tue, Jan 27, 2015 at 1:58 PM, Matthew J Zekauskas
<

<mailto:>>
wrote:

[popping up head out of sand momentarily]

Without having read the whole thread in detail, this feels like an MTU
black hole. If both ends think that the MTU is 9000, and successfully
negotiate a large MSS, and then actually try to use it, but some element in
the middle drops large MTU packets.... then a session will start, but then
hang as the large packet sent and repeatedly retransmitted but dropped.

The test Azher suggests (ping with large packets) is an easy way to see
if large packets make it through successfully. It should be sufficient to do
something large but not full sized (e.g. 8192) to see. However, if there is
a tunnel in the middle, then the tunnel can also push a full size packet over
the MTU, so if the large packet succeeds it probably still pays to try to do
the math (or just test using some sort of search strategy) to send a full
size packet and see if it makes it through.

I suppose using something like tracepath could also show this (although
if there is a black hole then the trace would just fail in the middle
somewhere instead of adjusting MTU lower).

--Matt
[back to sand]


On 1/27/15 2:35 PM, Trey Dockendorf wrote:
Both ends are 9000 MTU


[root@psonar-bwctl
~]# ip link show p1p1
6: p1p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP
qlen 1000
link/ether 90:e2:ba:2e:eb:50 brd ff:ff:ff:ff:ff:ff


[root@psonar-owamp
~]# ip link show p1p1
6: p1p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP
qlen 1000
link/ether 90:e2:ba:2e:ea:04 brd ff:ff:ff:ff:ff:ff

As a test case this afternoon I will be moving both these systems off my
local Force10 switch that links to Science DMZ to be directly connected to
equipment for our Science DMZ. This will at least allow our local networking
experts to better assist in debugging the problem and rule out a
misconfiguration on my local Force10 switch.

Thanks,
- Trey

=============================

Trey Dockendorf
Systems Analyst I
Texas A&M University
Academy for Advanced Telecommunications and Learning Technologies
Phone: (979)458-2396
Email:


<mailto:>
Jabber:


<mailto:>

On Tue, Jan 27, 2015 at 1:29 PM, Shawn McKee
<

<mailto:>>
wrote:

Is there an MTU mismatch between the hosts?

Sender at 9000 and receiver at 1500 and PMTU fails? Initial negotiation
will use packets < 1500, but data would by >1500 and if fragmentation is not
allowed packets are dropped.

Just a thought since you said 'The transfer started then scp reports
"stalled".'

Shawn

On Tue, Jan 27, 2015 at 2:15 PM, Eli Dart
<

<mailto:>>
wrote:

Hi Trey,

If you have root on the suspect box, run tcpdump during a test
that fails and see what's going on.

Measurement tools are wonderful and helpful and valuable, but if
things are busted enough that the tools can't run, sometimes you just have to
watch the packets to figure out what's going on....

Eli



On Tue, Jan 27, 2015 at 11:01 AM, Trey Dockendorf
<

<mailto:>>
wrote:

Transferring a 4.3GB file fails...very bizarre. The transfer started
then scp reports "stalled".

The failure is between the 2 perfsonar boxes. Transferring
to a perfsonar box from a host on our campus LAN and not on our science DMZ
works at the expected 1Gbps rate.

Transferring from another science DMZ host (stock CentOS) to
the perfsonar box fails.

Transferring from science DMZ to science DMZ, both stock
CentOS boxes, works. So it's only the interactions with perfsonar host that
fail.

I'm suspecting something local is broken and just so happens
it's effecting interactions to/from my perfsonar boxes.

If there's any suggestions on how to debug I'd be glad to
hear them, but seems like something is broken on our local network.

Thanks,
- Trey

=============================

Trey Dockendorf
Systems Analyst I
Texas A&M University
Academy for Advanced Telecommunications and Learning
Technologies
Phone: (979)458-2396 <tel:%28979%29458-2396>
Email:


<mailto:>
Jabber:


<mailto:>

On Tue, Jan 27, 2015 at 12:46 PM, Aaron Brown
<

<mailto:>>
wrote:

Hey Trey,

That is bizarre. Could you try scp’ing a large file
between the two hosts?

Cheers,
Aaron

On Jan 27, 2015, at 1:00 PM, Trey Dockendorf
<

<mailto:>>
wrote:

Yes, it occurs in both directions between
psonar-bwctl.brazos.tamu.edu <http://psonar-bwctl.brazos.tamu.edu/> and
psonar-owamp.brazos.tamu.edu <http://psonar-owamp.brazos.tamu.edu/>.

If I run the server side on either of those boxes and try
and run the client from a plain CentOS host the tests also don't work.

These systems are both stock net installs of PS 3.4

- Trey

=============================

Trey Dockendorf
Systems Analyst I
Texas A&M University
Academy for Advanced Telecommunications and Learning
Technologies
Phone: (979)458-2396 <tel:%28979%29458-2396>
Email:


<mailto:>
Jabber:


<mailto:>

On Tue, Jan 27, 2015 at 8:45 AM, Aaron Brown
<

<mailto:>>
wrote:

Hey Trey,

Does this happen in both directions?

Cheers,
Aaron

On Jan 26, 2015, at 7:46 PM, Trey Dockendorf
<

<mailto:>>
wrote:

As the remote end was not setup by me I can't test
this particular host, but instead I've tried iperf between my latency PS host
and bandwidth PS host and got same results. Below are results using iperf3
and nuttcp. With iperf the connection seemed to stall at first then produce
0 bits/sec messages, but with iperf3 it rapidly printed the interval lines.

psonar-bwctl:
# iperf3 -p 5001 -s

psonar-owamp:
# iperf3 -c psonar-bwctl.brazos.tamu.edu
<http://psonar-bwctl.brazos.tamu.edu/> -p 5001 -i 1
Connecting to host psonar-bwctl.brazos.tamu.edu
<http://psonar-bwctl.brazos.tamu.edu/>, port 5001
[ 4] local 165.91.55.4 port 60629 connected to
165.91.55.6 port 5001
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 87.4 KBytes 715 Kbits/sec 2
26.2 KBytes
[ 4] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 1
26.2 KBytes
[ 4] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 0
26.2 KBytes
[ 4] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec 1
26.2 KBytes
[ 4] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec 0
26.2 KBytes
[ 4] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec 0
26.2 KBytes
[ 4] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec 1
26.2 KBytes
[ 4] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec 0
26.2 KBytes
[ 4] 8.00-9.00 sec 0.00 Bytes 0.00 bits/sec 0
26.2 KBytes
[ 4] 9.00-10.00 sec 0.00 Bytes 0.00 bits/sec 0
26.2 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 87.4 KBytes 71.6 Kbits/sec
5 sender
[ 4] 0.00-10.00 sec 0.00 Bytes 0.00 bits/sec
receiver


Couldn't quite make nuttcp work:

psonar-bwctl:
# nuttcp -1

psonar-owamp:
# nuttcp -b psonar-bwctl.brazos.tamu.edu
<http://psonar-bwctl.brazos.tamu.edu/>
nuttcp_mread: Bad file descriptor

=============================

Trey Dockendorf
Systems Analyst I
Texas A&M University
Academy for Advanced Telecommunications and Learning
Technologies
Phone: (979)458-2396 <tel:%28979%29458-2396>
Email:


<mailto:>
Jabber:


<mailto:>

On Mon, Jan 26, 2015 at 6:33 PM, Brian Tierney
<

<mailto:>>
wrote:


I'm not aware of anything that might explain
that. Can you try iperf3 and nuttcp to see if they behave the same?



On Mon, Jan 26, 2015 at 12:22 PM, Trey Dockendorf
<

<mailto:>>
wrote:

Right now most of the endpoints I'm testing
against seem to not work with bwctl so to help some colleagues I've been
trying to just use iperf by itself. From my perfsonar boxes the iperf tests
seem to do nothing while a non-perfsonar host with iperf installed from EPEL
gives expected output. The results are below. I've removed the remote
information as these were not against a perfsonar box but a remote site's
cluster login node.

Is there something about iperf on a PS host
that could cause this issue? The 2 systems below are on same network and same
core switch.

PERFSONAR:
# iperf -c <REMOTE HOST> -p 50100 -t 20 -i 1

------------------------------------------------------------
Client connecting to <REMOTE HOST>, TCP port
50100
TCP window size: 92.6 KByte (default)

------------------------------------------------------------
[ 3] local 165.91.55.6 port 33135 connected with
<REMOTE IP> port 50100
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 175 KBytes 1.43
Mbits/sec
[ 3] 1.0- 2.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 2.0- 3.0 sec 0.00 Bytes 0.00 bits/sec
<Repeated 100s of times with 0.00 bits/sec>
[ 3] 931.0-932.0 sec 0.00 Bytes 0.00
bits/sec

NON-PERFSONAR:

$ iperf -c <REMOTE HOST> -p 50100 -t 20 -i 1

------------------------------------------------------------
Client connecting to <REMOTE HOST>, TCP port
50100
TCP window size: 92.6 KByte (default)

------------------------------------------------------------
[ 3] local 165.91.55.28 port 51252 connected with
<REMOTE IP> port 50100
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 60.5 MBytes 508 Mbits/sec
[ 3] 1.0- 2.0 sec 30.5 MBytes 256 Mbits/sec
[ 3] 2.0- 3.0 sec 32.9 MBytes 276 Mbits/sec
[ 3] 3.0- 4.0 sec 36.2 MBytes 304 Mbits/sec
[ 3] 4.0- 5.0 sec 36.6 MBytes 307 Mbits/sec
[ 3] 5.0- 6.0 sec 25.1 MBytes 211 Mbits/sec
[ 3] 6.0- 7.0 sec 27.2 MBytes 229 Mbits/sec
[ 3] 7.0- 8.0 sec 33.5 MBytes 281 Mbits/sec
[ 3] 8.0- 9.0 sec 32.2 MBytes 271 Mbits/sec
[ 3] 9.0-10.0 sec 31.9 MBytes 267 Mbits/sec
[ 3] 10.0-11.0 sec 24.9 MBytes 209 Mbits/sec
[ 3] 11.0-12.0 sec 29.9 MBytes 251 Mbits/sec
[ 3] 12.0-13.0 sec 36.8 MBytes 308 Mbits/sec
[ 3] 13.0-14.0 sec 33.0 MBytes 277 Mbits/sec
[ 3] 14.0-15.0 sec 21.6 MBytes 181 Mbits/sec
[ 3] 15.0-16.0 sec 16.6 MBytes 139 Mbits/sec
[ 3] 16.0-17.0 sec 22.0 MBytes 185 Mbits/sec
[ 3] 17.0-18.0 sec 23.6 MBytes 198 Mbits/sec
[ 3] 18.0-19.0 sec 23.5 MBytes 197 Mbits/sec
[ 3] 19.0-20.0 sec 19.2 MBytes 161 Mbits/sec
[ 3] 0.0-20.0 sec 598 MBytes 250 Mbits/sec

Thanks,
- Trey

=============================

Trey Dockendorf
Systems Analyst I
Texas A&M University
Academy for Advanced Telecommunications and
Learning Technologies
Phone: (979)458-2396 <tel:%28979%29458-2396>
Email:


<mailto:>
Jabber:


<mailto:>




--
Brian Tierney, http://www.es.net/tierney
Energy Sciences Network (ESnet), Berkeley
National Lab
http://fasterdata.es.net
<http://fasterdata.es.net/>









--
Eli Dart, Network Engineer NOC: (510) 486-7600
<tel:%28510%29%20486-7600>
ESnet Office of the CTO (AS293) (800) 333-7638
<tel:%28800%29%20333-7638>
Lawrence Berkeley National Laboratory
PGP Key fingerprint = C970 F8D3 CFDD 8FFF 5486 343A 2D31 4478
5F82 B2B3








Archive powered by MHonArc 2.6.16.

Top of Page