perfsonar-user - Re: [perfsonar-user] Throughput suddenly unidirectional
Subject: perfSONAR User Q&A and Other Discussion
List archive
- From: Jason Zurawski <>
- To: Daniel Schmidt <>
- Cc:
- Subject: Re: [perfsonar-user] Throughput suddenly unidirectional
- Date: Wed, 3 Dec 2014 13:07:52 -0500
Hi Dan;
Thanks for sending the 2nd set of logs, and the other information. We are
still not able to determine what is going on however, could you send us one
more log file:
/var/log/perfsonar/regular_testing.log
This will help us see if the issue is related to the testing itself, or the
storage/graphing.
Thanks;
-jason
On Dec 2, 2014, at 3:05 PM, Daniel Schmidt
<>
wrote:
> If I forget to include something, please remind me
>
> * /etc/init.d/iptables stop, however, I would think that would have shown
> up on my bench test, no?
> * Included logs from other side
> * OWAMP? Could you give me a bit - I need to look that up, I don't see it
> * Ntp looks fine 2 me, I'll post later in message
> * Yes, I did reverse the c & f. But.....
>
> I just did it again - look at this done on 2.2.2.2 (remote side)
>
> [root@localhost
> admin]# bwctl -f m -x -T iperf3 -t 30 -i 1 -c 1.1.1.1 -s 2.2.2.2
> bwctl: Using tool: iperf3
> bwctl: 37 seconds until test results available
>
> RECEIVER START
> -----------------------------------------------------------
> Server listening on 5601
> -----------------------------------------------------------
> Accepted connection from 2.2.2.2, port 45812
> [ 17] local 1.1.1.1 port 5601 connected to 2.2.2.2 port 52941
> [ ID] Interval Transfer Bandwidth
> [ 17] 0.00-1.00 sec 45.6 MBytes 382 Mbits/sec
> [ 17] 1.00-2.00 sec 47.3 MBytes 397 Mbits/sec
> [ 17] 2.00-3.00 sec 47.5 MBytes 398 Mbits/sec
> [ 17] 3.00-4.00 sec 36.8 MBytes 309 Mbits/sec
> [ 17] 4.00-5.00 sec 40.0 MBytes 336 Mbits/sec
> [ 17] 5.00-6.00 sec 31.5 MBytes 265 Mbits/sec
> [ 17] 6.00-7.00 sec 21.2 MBytes 178 Mbits/sec
> [ 17] 7.00-8.00 sec 30.2 MBytes 254 Mbits/sec
> [ 17] 8.00-9.00 sec 32.7 MBytes 274 Mbits/sec
> [ 17] 9.00-10.00 sec 36.6 MBytes 307 Mbits/sec
> [ 17] 10.00-11.00 sec 22.5 MBytes 189 Mbits/sec
> [ 17] 11.00-12.00 sec 31.1 MBytes 261 Mbits/sec
> [ 17] 12.00-13.00 sec 3.36 MBytes 28.2 Mbits/sec
> [ 17] 13.00-14.00 sec 28.1 MBytes 236 Mbits/sec
> [ 17] 14.00-15.00 sec 43.1 MBytes 361 Mbits/sec
> [ 17] 15.00-16.00 sec 35.5 MBytes 297 Mbits/sec
> [ 17] 16.00-17.00 sec 42.2 MBytes 354 Mbits/sec
> [ 17] 17.00-18.00 sec 39.9 MBytes 335 Mbits/sec
> [ 17] 18.00-19.00 sec 8.22 MBytes 69.0 Mbits/sec
> [ 17] 19.00-20.00 sec 36.8 MBytes 309 Mbits/sec
> [ 17] 20.00-21.00 sec 39.9 MBytes 335 Mbits/sec
> [ 17] 21.00-22.00 sec 38.7 MBytes 325 Mbits/sec
> [ 17] 22.00-23.00 sec 13.5 MBytes 113 Mbits/sec
> [ 17] 23.00-24.00 sec 539 KBytes 4.41 Mbits/sec
> [ 17] 24.00-25.00 sec 617 KBytes 5.05 Mbits/sec
> [ 17] 25.00-26.00 sec 29.0 MBytes 243 Mbits/sec
> [ 17] 26.00-27.00 sec 16.5 MBytes 138 Mbits/sec
> [ 17] 27.00-28.00 sec 642 KBytes 5.26 Mbits/sec
> [ 17] 28.00-29.00 sec 10.7 MBytes 89.5 Mbits/sec
> [ 17] 29.00-30.00 sec 533 KBytes 4.37 Mbits/sec
> [ 17] 30.00-30.04 sec 21.2 KBytes 4.55 Mbits/sec
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bandwidth Retr
> [ 17] 0.00-30.04 sec 811 MBytes 226 Mbits/sec 220 sender
> [ 17] 0.00-30.04 sec 811 MBytes 226 Mbits/sec
> receiver
>
> RECEIVER END
>
> SENDER START
> Connecting to host 1.1.1.1, port 5601
> [ 16] local 2.2.2.2 port 52941 connected to 1.1.1.1 port 5601
> [ ID] Interval Transfer Bandwidth Retr Cwnd
> [ 16] 0.00-1.00 sec 47.8 MBytes 401 Mbits/sec 62 55.1 KBytes
>
> [ 16] 1.00-2.00 sec 47.4 MBytes 397 Mbits/sec 3 53.7 KBytes
>
> [ 16] 2.00-3.00 sec 46.9 MBytes 394 Mbits/sec 4 45.2 KBytes
>
> [ 16] 3.00-4.00 sec 36.5 MBytes 306 Mbits/sec 6 45.2 KBytes
>
> [ 16] 4.00-5.00 sec 39.8 MBytes 334 Mbits/sec 9 43.8 KBytes
>
> [ 16] 5.00-6.00 sec 32.3 MBytes 271 Mbits/sec 1 70.7 KBytes
>
> [ 16] 6.00-7.00 sec 19.7 MBytes 166 Mbits/sec 3 49.5 KBytes
>
> [ 16] 7.00-8.00 sec 30.1 MBytes 252 Mbits/sec 17 36.8 KBytes
>
> [ 16] 8.00-9.00 sec 34.5 MBytes 289 Mbits/sec 10 66.5 KBytes
>
> [ 16] 9.00-10.00 sec 36.1 MBytes 302 Mbits/sec 10 48.1 KBytes
>
> [ 16] 10.00-11.00 sec 22.7 MBytes 190 Mbits/sec 9 50.9 KBytes
>
> [ 16] 11.00-12.00 sec 31.0 MBytes 260 Mbits/sec 14 50.9 KBytes
>
> [ 16] 12.00-13.00 sec 2.72 MBytes 22.8 Mbits/sec 3 29.7 KBytes
>
> [ 16] 13.00-14.00 sec 28.9 MBytes 243 Mbits/sec 0 74.9 KBytes
>
> [ 16] 14.00-15.00 sec 42.4 MBytes 356 Mbits/sec 4 36.8 KBytes
>
> [ 16] 15.00-16.00 sec 35.7 MBytes 300 Mbits/sec 3 43.8 KBytes
>
> [ 16] 16.00-17.00 sec 42.0 MBytes 352 Mbits/sec 7 35.4 KBytes
>
> [ 16] 17.00-18.00 sec 40.8 MBytes 342 Mbits/sec 4 59.4 KBytes
>
> [ 16] 18.00-19.00 sec 7.26 MBytes 60.9 Mbits/sec 8 36.8 KBytes
>
> [ 16] 19.00-20.00 sec 37.0 MBytes 310 Mbits/sec 5 36.8 KBytes
>
> [ 16] 20.00-21.00 sec 40.8 MBytes 343 Mbits/sec 2 67.9 KBytes
>
> [ 16] 21.00-22.00 sec 38.5 MBytes 323 Mbits/sec 17 25.5 KBytes
>
> [ 16] 22.00-23.00 sec 11.7 MBytes 98.1 Mbits/sec 2 19.8 KBytes
>
> [ 16] 23.00-24.00 sec 488 KBytes 4.00 Mbits/sec 0 22.6 KBytes
>
> [ 16] 24.00-25.00 sec 650 KBytes 5.33 Mbits/sec 0 24.0 KBytes
>
> [ 16] 25.00-26.00 sec 30.3 MBytes 254 Mbits/sec 3 48.1 KBytes
>
> [ 16] 26.00-27.00 sec 15.2 MBytes 128 Mbits/sec 12 21.2 KBytes
>
> [ 16] 27.00-28.00 sec 682 KBytes 5.58 Mbits/sec 0 24.0 KBytes
>
> [ 16] 28.00-29.00 sec 10.7 MBytes 89.6 Mbits/sec 2 18.4 KBytes
>
> [ 16] 29.00-30.00 sec 488 KBytes 4.00 Mbits/sec 0 21.2 KBytes
>
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bandwidth Retr
> [ 16] 0.00-30.00 sec 811 MBytes 227 Mbits/sec 220 sender
> [ 16] 0.00-30.00 sec 811 MBytes 227 Mbits/sec
> receiver
>
> This seems to indicate a problem with my circuit as I see this in no other
> location. But, my root question remains - why would this cause my graph
> freak out & stop graphing when it gets this data?
>
> I can't remember the iperf and nuttcp command line options off hand, if you
> think they would be helpful, I'll go read up & do them. Sorry - not
> purposefully being lazy, I just have to put this on the back burner for a
> few hours. Where is thrulay? I can remember how to use thrulay.
>
> Thanks for your time.
>
>
> NTP results:
> ifco
> root@localhost
> admin]# ntpq -p -c rv
> remote refid st t when poll reach delay offset
> jitter
> ==============================================================================
> -nms-rlat.chic.n 141.142.143.138 2 u 55 1024 377 27.404 -0.670
> 0.619
> +eth-1.nms-rlat. .IRIG. 1 u 590 1024 377 53.188 -0.156
> 0.236
> -nms-rlat.losa.n .CDMA. 1 u 671 1024 377 56.644 14.561
> 0.207
> +nms-rlat.newy32 .CDMA. 1 u 1016 1024 377 54.850 0.031
> 0.360
> -chronos.es.net .CDMA. 1 u 715 1024 377 52.939 0.814
> 0.220
> *saturn.es.net .CDMA. 1 u 457 1024 377 30.607 -0.196
> 5.333
> associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,
> version="ntpd
>
> Sat Nov 23 18:21:48 UTC 2013 (1)",
> processor="x86_64", system="Linux/2.6.32-504.1.3.el6.aufs.web100.x86_64",
> leap=00, stratum=2, precision=-23, rootdelay=30.607, rootdisp=54.621,
> refid=198.129.252.38,
> reftime=d8288af8.d66b698e Tue, Dec 2 2014 12:01:12.837,
> clock=d82890dd.8f707797 Tue, Dec 2 2014 12:26:21.560, peer=60532,
> tc=10, mintc=3, offset=-0.109, frequency=1.125, sys_jitter=0.123,
> clk_jitter=0.164, clk_wander=0.007
>
> [root@localhost
> admin]# ntpq -p -c rv
> remote refid st t when poll reach delay offset
> jitter
> ==============================================================================
> -nms-rlat.chic.n 141.142.143.138 2 u 928 1024 377 27.017 -0.995
> 0.932
> +nms-rlat.hous.n .IRIG. 1 u 25 1024 377 52.872 -0.338
> 0.189
> -nms-rlat.losa.n .CDMA. 1 u 517 1024 377 56.317 14.298
> 0.320
> -nms-rlat.newy32 .CDMA. 1 u 907 1024 377 54.451 -0.227
> 0.271
> +chronos.es.net .CDMA. 1 u 156 1024 377 54.627 -0.438
> 0.321
> *saturn.es.net .CDMA. 1 u 19 1024 377 30.346 -0.456
> 0.304
> associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,
> version="ntpd
>
> Sat Nov 23 18:21:48 UTC 2013 (1)",
> processor="x86_64", system="Linux/2.6.32-504.1.3.el6.aufs.web100.x86_64",
> leap=00, stratum=2, precision=-23, rootdelay=30.346, rootdisp=36.512,
> refid=198.129.252.38,
> reftime=d8289155.99671fb2 Tue, Dec 2 2014 12:28:21.599,
> clock=d8289168.31de148f Tue, Dec 2 2014 12:28:40.194, peer=13268,
> tc=10, mintc=3, offset=-0.413, frequency=-0.412, sys_jitter=0.067,
> clk_jitter=0.133, clk_wander=0.020
>
>
>
> On Tue, Dec 2, 2014 at 10:36 AM, Jason Zurawski
> <>
> wrote:
> Hey Dan;
>
> Looking through the logs, the only suspect thing I see are lines of this
> nature:
>
> > Dec 2 10:05:01 localhost bwctld[12565]: FILE=endpoint.c, LINE=1314,
> > PeerAgent: Peer cancelled test before expected
>
> Unfortunately that tells us the ‘what’ but not the ‘how’. Could you also
> send the logs from the other host you are using? That host may have more
> details about what is going on. Couple other things that came to mind:
>
> >> * No firewall between A & B
>
> IPTables may be on for both sides, it may be a quick and dirty test to just
> disable that to see if that helps?
>
> >> * I'm not familiar with "slots." There are few throughput tests running
> >> though. (Tests running 33% of time)
>
> Ok, this won’t be the issue I was thinking of.
>
> >> * I assumed packet loss was an issue. So, I setup smokeping on both
> >> sides, 5 every 30 seconds, 1472 MTU. However, I'm not getting loss.
>
> Do you have OWAMP going between the two hosts? If you don’t, I would
> suggest setting up that test too. OWAMP uses UDP packets which may give a
> different clue than the ICMP that smokeping would use.
>
> >> * PsPerformance comes with ntp on - appears to be running, they have the
> >> same time & these machines are not behind any firewalls.
>
> Could you send the output of ‘ntpq -p -c rv’ for both?
>
> >> * I am not seeing the issue on command line bwctl. Strange.
>
> Could you try the reverse direction as well - e.g. swap the hosts for the
> -c and -s flags? Also try using ‘iperf’ and ‘nuttcp’ as the tool instead
> of ‘iperf3’.
>
> Thanks;
>
> -jason
>
> On Dec 2, 2014, at 12:16 PM, Daniel Schmidt
> <>
> wrote:
>
> > Thank you kindly for your reply. Some short responses:
> >
> > * No firewall between A & B
> > * I'm not familiar with "slots." There are few throughput tests running
> > though. (Tests running 33% of time)
> > * I assumed packet loss was an issue. So, I setup smokeping on both
> > sides, 5 every 30 seconds, 1472 MTU. However, I'm not getting loss.
> > * PsPerformance comes with ntp on - appears to be running, they have the
> > same time & these machines are not behind any firewalls.
> > * I am not seeing the issue on command line bwctl. Strange.
> > * Cacti minute graphs don't show any strange usage on the ICX switch.
> >
> > I would suspect hardware, but the boxes ran a solid a line for hours on
> > my bench test. Please forgive me, but I'm reluctant to give the IP's as
> > I haven't really figured out how I would prevent hackers from using these
> > machines to DOS me. (Does anybody have to mitigate this issue? Sorry -
> > off topic question) However, I'd be happy to privately give you root on
> > the box.
> >
> > I have attached a png of what I see. You can see the lines greatly vary
> > greatly and around 9:30 the thruput suddenly decided to start working
> > again. I have also attached the log, replacing 1.1.1.1 for local and
> > 2.2.2.2 for remote.
> >
> > Many thanks,
> > -Dan
> >
> > On Mon, Dec 1, 2014 at 4:24 PM, Jason Zurawski
> > <>
> > wrote:
> > Hey Daniel;
> >
> > Would you be able to provide a link to your node, or send along a
> > screenshot, to give us a better idea of what you are seeing?
> >
> > Off the top of my head, here are a couple of typical reasons that tests
> > could fail:
> >
> > - Firewalls in the path denying access to ports, or not enough
> > ports available for the number of tests that are running
> >
> > - Lack of testing ‘slots’ available on one side or the other
> >
> > - NTP synchronization issues
> >
> > - Packet loss that prevents the test from starting or finishing.
> >
> > If you send along your /var/log/perfsonar/owamp_bwctl.log file, we can
> > have a look to see what may be menacing your node. The other thing you
> > can try is some by-hand tests, something like:
> >
> > bwctl -f m -x -T iperf3 -t 30 -i 1 -c HOST1 -s HOST2
> >
> > Thanks;
> >
> > -jason
> >
> > On Dec 1, 2014, at 5:43 PM, Daniel Schmidt
> > <>
> > wrote:
> >
> > > I've noticed strange behavior on our throughput tests at one site.
> > > Sometimes, the graph turn unidirectional - ie, one way stops working.
> > > Sometimes, both ways will stop working. The times are random.
> > > Although the site is verified up by ping and passes traffic, however
> > > the throughput graphs vary greatly. (We believe due to issues with
> > > this circuit)
> > >
> > > I've only seen it do this in this one case. It's almost like it gets
> > > angry that the speed varies vastly and gives up.
> > >
> > > Has anybody else encountered this? Any ideas greatly appreciated.
- [perfsonar-user] Throughput suddenly unidirectional, Daniel Schmidt, 12/01/2014
- Re: [perfsonar-user] Throughput suddenly unidirectional, Jason Zurawski, 12/01/2014
- Re: [perfsonar-user] Throughput suddenly unidirectional, Daniel Schmidt, 12/02/2014
- Re: [perfsonar-user] Throughput suddenly unidirectional, Jason Zurawski, 12/02/2014
- Re: [perfsonar-user] Throughput suddenly unidirectional, Daniel Schmidt, 12/02/2014
- Re: [perfsonar-user] Throughput suddenly unidirectional, Jason Zurawski, 12/03/2014
- Re: [perfsonar-user] Throughput suddenly unidirectional, Daniel Schmidt, 12/03/2014
- Re: [perfsonar-user] Throughput suddenly unidirectional, Andrew Lake, 12/03/2014
- Re: [perfsonar-user] Throughput suddenly unidirectional, Daniel Schmidt, 12/03/2014
- Re: [perfsonar-user] Throughput suddenly unidirectional, Daniel Schmidt, 12/04/2014
- Re: [perfsonar-user] Throughput suddenly unidirectional, Aaron Brown, 12/05/2014
- Re: [perfsonar-user] Throughput suddenly unidirectional, Daniel Schmidt, 12/03/2014
- Re: [perfsonar-user] Throughput suddenly unidirectional, Daniel Schmidt, 12/09/2014
- Re: [perfsonar-user] Throughput suddenly unidirectional, Jason Zurawski, 12/15/2014
- Re: [perfsonar-user] Throughput suddenly unidirectional, Daniel Schmidt, 12/18/2014
- Re: [perfsonar-user] Throughput suddenly unidirectional, Jason Zurawski, 12/19/2014
- Re: [perfsonar-user] Throughput suddenly unidirectional, Jason Zurawski, 12/03/2014
- Re: Re: [perfsonar-user] Throughput suddenly unidirectional, Brian Candler, 12/21/2014
- Re: [perfsonar-user] Throughput suddenly unidirectional, Daniel Schmidt, 12/02/2014
- Re: [perfsonar-user] Throughput suddenly unidirectional, Jason Zurawski, 12/02/2014
- Re: [perfsonar-user] Throughput suddenly unidirectional, Daniel Schmidt, 12/02/2014
- Re: [perfsonar-user] Throughput suddenly unidirectional, Jason Zurawski, 12/01/2014
Archive powered by MHonArc 2.6.16.