Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] bwctl test error

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] bwctl test error


Chronological Thread 
  • From: Bogdan Ghita <>
  • To: Matthew J Zekauskas <>
  • Cc:
  • Subject: Re: [perfsonar-user] bwctl test error
  • Date: Thu, 26 May 2016 17:18:38 +0100

Hi Matt

Not sure about the MTU, but you were right about the buffer.

I reduced the packet length all the way to 500 bytes and that seemed to improve the chances of success. What seemed to make a difference was bringing down the window, which always allows the test to complete (-w 64k worked like a charm). I guess that one of the routers/firewalls along the path has a rather small buffer, which fills up when there is cross-traffic and the window picks up; there is a network upgrade looming, so I'm hoping to have better news once that happens.

Best regards
Bogdan


On 25 May 2016 at 04:39, Matthew J Zekauskas <> wrote:

fwiw, to me this feels very much like a MTU black hole (or a very small buffer somewhere).   I don't like all the 0 entries.  You might try setting the following on the sender if it's available and you haven't already, and see if you get more consistent performance.  if there is a black hole it will probably still not be excellent, but it should recover and continue to send traffic.

Stolen from <http://fasterdata.es.net/host-tuning/linux/>:

# recommended for hosts with jumbo frames enabled
net.ipv4.tcp_mtu_probing=1

This will turn on packet-level path MTU discovery.

Alternatively, you can continue to reduce the MTU and see if it starts working... (1400 may not be enough; you did say you tried other sizes, but I don't know how small you made it.)

Artificially lowering the window (say to something like tiny in our world like 64k to start, -w64k) may also reduce the pressure enough to allow packets through if in fact there is just a really small buffer. (You won't get good performance with small windows unless the endpoints are really close together.)  If that works, you can raise it and see when things completely stop.

Brian's tracepaths could also indicate that somewhere the ICMP responses are being dropped; depending on what precisely is dropped/blocked, that could contribute to the black hole.  (The paths may also be asymmetric, although I don't think we have enough data to draw that conclusion definitively.) (also 172.16.0.33 is a RFC 1918 private address, so there may be some intermediate hops that simply aren't directly addressable)

--Matt

On 5/24/16 6:27 PM, Brian Tierney wrote:

Looks like you have a very asymmetric route, and a MTU issue in 1 direction.

>bwtraceroute -T tracepath -c perfsonar-bandwidth.esc.qmul.ac.uk -s 141.163.170.165

bwtraceroute: Using tool: tracepath

bwtraceroute: 17 seconds until test results available


SENDER START

 1?: [LOCALHOST]     pmtu 1400

 1:  141.163.170.253 (141.163.170.253)                      0.484ms 

 1:  141.163.170.253 (141.163.170.253)                      2.269ms 

 2:  172.16.0.33 (172.16.0.33)                              0.418ms 

 3:  

SENDER END

>bwtraceroute -T tracepath -s perfsonar-bandwidth.esc.qmul.ac.uk -c 141.163.170.165

bwtraceroute: Using tool: tracepath

bwtraceroute: 18 seconds until test results available


SENDER START

 1?: [LOCALHOST]     pmtu 9000

 1:  MEPP3CSW01.Vlan843.ACCB.sd02-vrf-internet.core-net.qmul.ac.uk (194.36.11.3)   0.745ms 

 1:  MEPP3CSW01.Vlan843.ACCB.sd02-vrf-internet.core-net.qmul.ac.uk (194.36.11.3)  10.980ms 

 2:  146.97.143.209 (146.97.143.209)                        1.352ms 

 3:  ae25.londtw-sbr1.ja.net (146.97.35.213)                1.404ms 

 4:  ae30.londpg-sbr2.ja.net (146.97.33.5)                  1.898ms 

 5:  ae0.briswe-rbr1.ja.net (146.97.37.202)                 3.935ms 

 6:  xe-0-0-0.plymup-rbr1.ja.net (146.97.67.74)             6.733ms 

 7:  xe-0-0-0.plymup-rbr1.ja.net (146.97.67.74)             6.721ms pmtu 1500

 7:  

SENDER END



On Tue, May 24, 2016 at 8:56 AM, <> wrote:
Hello

I installed a perfsonar server and I've been facing similar problems with
bandwidth measurement, see below. I also set the MTU to different sizes (the
test below is with MTU=1400), but that did not seem to make a difference.
Occasionally, the tests do actually succeed (although the reported speed is
rather low - around 200Mbps for a Gb network and uplink).

Any thoughts about what could be the problem?

Best regards
Bogdan

----

bwctl -T iperf3 -t 10 -O 4 -v -x -c "perfsonar-bandwidth.esc.qmul.ac.uk:4823"
Messages being sent to syslog(user,err)
bwctl: Using 141.163.170.165 as the address for local sender
bwctl: Using perfsonar-bandwidth.esc.qmul.ac.uk:4823 as the address for remote
receiver
bwctl: Available in-common: iperf nuttcp iperf3
bwctl: Using tool: iperf3
bwctl: Server 'perfsonar-bandwidth.esc.qmul.ac.uk:4823' accepted test request
at time 1464087718.423526
bwctl: Client 'localhost' accepted test request at time 1464087735.082031
bwctl: Tests accepted at different times re-requesting test with new time
bwctl: Server 'perfsonar-bandwidth.esc.qmul.ac.uk:4823' accepted test request
at time 1464087740.078125
bwctl: Client 'localhost' accepted test request at time 1464087740.078125

RECEIVER START
bwctl: start_endpoint: 3673076511.268639
bwctl: run_endpoint: receiver: 194.36.11.37
bwctl: run_endpoint: sender: 141.163.170.165
bwctl: exec_line: iperf3 -s -1 -B 194.36.11.37 -f m -p 5656 -V
bwctl: run_tool: tester: iperf3
bwctl: run_tool: receiver: 194.36.11.37
bwctl: run_tool: sender: 141.163.170.165
bwctl: start_tool: 3673076537.000126
iperf 3.1.2
Linux perfsonar-bandwidth.esc.qmul.ac.uk 2.6.32-573.26.1.el6.web100.x86_64 #1
SMP Fri May 6 11:17:16 PDT 2016 x86_64
-----------------------------------------------------------
Server listening on 5656
-----------------------------------------------------------
Time: Tue, 24 May 2016 11:02:20 GMT
Accepted connection from 141.163.170.165, port 47577
     Cookie: localhost.localdomain.1464087740.079
     TCP MSS: 1368 (default)
[ 16] local 194.36.11.37 port 5656 connected to 141.163.170.165 port 47616
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 4
seconds, 10 second test
[ ID] Interval           Transfer     Bandwidth
[ 16]   0.00-1.00   sec  8.23 MBytes  69.0 Mbits/sec
(omitted)
[ 16]   1.00-2.00   sec  0.00 Bytes  0.00 Mbits/sec                  (omitted)
[ 16]   2.00-3.00   sec  0.00 Bytes  0.00 Mbits/sec                  (omitted)
[ 16]   3.00-4.00   sec  0.00 Bytes  0.00 Mbits/sec                  (omitted)
[ 16]   0.00-1.00   sec  0.00 Bytes  0.00 Mbits/sec
[ 16]   1.00-2.00   sec  0.00 Bytes  0.00 Mbits/sec
[ 16]   2.00-3.00   sec  0.00 Bytes  0.00 Mbits/sec
[ 16]   3.00-4.00   sec  0.00 Bytes  0.00 Mbits/sec
[ 16]   4.00-5.00   sec  0.00 Bytes  0.00 Mbits/sec
[ 16]   5.00-6.00   sec  0.00 Bytes  0.00 Mbits/sec
[ 16]   6.00-7.00   sec  0.00 Bytes  0.00 Mbits/sec
[ 16]   7.00-8.00   sec  0.00 Bytes  0.00 Mbits/sec
[ 16]   8.00-9.00   sec  0.00 Bytes  0.00 Mbits/sec
[ 16]   9.00-10.00  sec  0.00 Bytes  0.00 Mbits/sec
[ 16]  10.00-10.04  sec  0.00 Bytes  0.00 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bandwidth
[ 16]   0.00-10.04  sec  0.00 Bytes  0.00 Mbits/sec                  sender
[ 16]   0.00-10.04  sec  0.00 Bytes  0.00 Mbits/sec                  receiver
CPU Utilization: local/receiver 0.1% (0.0%u/0.2%s), remote/sender 0.0%
(0.0%u/0.0%s)
bwctl: stop_tool: 3673076554.303890
bwctl: stop_endpoint: 3673076554.303890

RECEIVER END

SENDER START
bwctl: start_endpoint: 3673076511.272606
bwctl: run_endpoint: receiver: 194.36.11.37
bwctl: run_endpoint: sender: 141.163.170.165
bwctl: exec_line: iperf3 -c 194.36.11.37 -B 141.163.170.165 -f m -p 5656 -V -Z
--omit 4 -t 10
bwctl: run_tool: tester: iperf3
bwctl: run_tool: receiver: 194.36.11.37
bwctl: run_tool: sender: 141.163.170.165
bwctl: start_tool: 3673076540.078318
iperf 3.1.2
Linux localhost.localdomain 2.6.32-573.26.1.el6.x86_64 #1 SMP Wed May 4
00:57:44 UTC 2016 x86_64
Time: Tue, 24 May 2016 11:02:20 GMT
Connecting to host 194.36.11.37, port 5656
     Cookie: localhost.localdomain.1464087740.079
     TCP MSS: 1368 (default)
[ 15] local 141.163.170.165 port 47616 connected to 194.36.11.37 port 5656
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 4
seconds, 10 second test
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[ 15]   0.00-1.00   sec  11.0 MBytes  92.4 Mbits/sec    2   1.62 MBytes
(omitted)
[ 15]   1.00-2.00   sec  0.00 Bytes  0.00 Mbits/sec    1   1.62 MBytes
(omitted)
[ 15]   2.00-3.00   sec  0.00 Bytes  0.00 Mbits/sec    0   1.62 MBytes
(omitted)
[ 15]   3.00-4.00   sec  0.00 Bytes  0.00 Mbits/sec    1   1.62 MBytes
(omitted)
[ 15]   0.00-1.00   sec  0.00 Bytes  0.00 Mbits/sec    0   1.62 MBytes
[ 15]   1.00-2.00   sec  0.00 Bytes  0.00 Mbits/sec    0   1.62 MBytes
[ 15]   2.00-3.00   sec  0.00 Bytes  0.00 Mbits/sec    0   1.62 MBytes
[ 15]   3.00-4.00   sec  0.00 Bytes  0.00 Mbits/sec    1   1.62 MBytes
[ 15]   4.00-5.00   sec  0.00 Bytes  0.00 Mbits/sec    0   1.62 MBytes
[ 15]   5.00-6.00   sec  0.00 Bytes  0.00 Mbits/sec    0   1.62 MBytes
[ 15]   6.00-7.00   sec  0.00 Bytes  0.00 Mbits/sec    0   1.62 MBytes
[ 15]   7.00-8.00   sec  0.00 Bytes  0.00 Mbits/sec    0   1.62 MBytes
[ 15]   8.00-9.00   sec  0.00 Bytes  0.00 Mbits/sec    0   1.62 MBytes
[ 15]   9.00-10.00  sec  0.00 Bytes  0.00 Mbits/sec    0   1.62 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bandwidth       Retr
[ 15]   0.00-10.00  sec  0.00 Bytes  0.00 Mbits/sec    1             sender
[ 15]   0.00-10.00  sec  0.00 Bytes  0.00 Mbits/sec                  receiver
CPU Utilization: local/sender 0.2% (0.1%u/0.2%s), remote/receiver 0.1%
(0.0%u/0.2%s)

iperf Done.
bwctl: stop_tool: 3673076554.301276
bwctl: stop_endpoint: 3673076554.301276

SENDER END

---



--
Brian Tierney, http://www.es.net/tierney
Energy Sciences Network (ESnet), Berkeley National Lab
http://fasterdata.es.net






Archive powered by MHonArc 2.6.16.

Top of Page