Skip to Content.
Sympa Menu

perfsonar-user - Re: [perf-node-users] Re: [perfsonar-user] Help with inconsistent bwctl measurements

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perf-node-users] Re: [perfsonar-user] Help with inconsistent bwctl measurements


Chronological Thread 
  • From: Jason Zurawski <>
  • To: Roderick Mooi <>
  • Cc: , perf-node-users <>
  • Subject: Re: [perf-node-users] Re: [perfsonar-user] Help with inconsistent bwctl measurements
  • Date: Thu, 17 Oct 2013 07:29:28 -0400

Hey Roderick;

Besides Shawn's suggestion (which is good, and something to look into), I
would add the classic suggestions of being sure the cables/fibers are clean
and un-crimped, and that the local machines are happy with their drivers.

Digging a little more, I was comparing these two graphs (for the second be
sure to check 'show reverse direction data', and maybe slide zoom in on a 1-2
hr chunk):

http://192.96.2.247/serviceTest/bandwidthGraph.cgi?url=http://localhost:8085/perfSONAR_PS/services/pSB&key=d9013ce7df20b8bbe45defeaeae785d6&keyR=0a0ed6c928edf28976414a2cc7e87d6f&dstIP=192.96.2.247&srcIP=196.21.48.249&dst=192.96.2.247&src=perfsonara.sanren.ac.za&type=TCP&length=7776000

https://192.96.2.247/serviceTest/delayGraph.cgi?url=http://localhost:8085/perfSONAR_PS/services/pSB&key=de3625b0c481ef8338aab14be049313d&keyR=2d31fa42a62b9188f88046b2f24b7510&dstIP=192.96.2.247&srcIP=196.21.48.249&dst=192.96.2.247&src=196.21.48.249&type=TCP&length=604800&bucket_width=0.0001

Zooming in on the OWAMP graph shows the near constant loss in the
192.96.2.247 -> 196.21.48.249 direction, which matches what BWCTL notes. The
only appreciable difference I can see is when doing traceroutes originating
from 192.96.2.247 and going to 196.21.48.249 and 155.232.40.58:

http://192.96.2.247/toolkit/gui/reverse_traceroute.cgi?target=196.21.48.249&function=traceroute
http://192.96.2.247/toolkit/gui/reverse_traceroute.cgi?target=155.232.40.58&function=traceroute

While basically the same, hops 2 and 4 report a slight different answer
(which lends credence to the ECMP idea - or a bad interface).

Thanks;

-jason

On Oct 17, 2013, at 5:37 AM, Shawn McKee
<>
wrote:

> Could there be some kind of ECMP (Equal Cost Multi-Pathing) between this
> source and destination and one of the alternate links is not good?
>
> Shawn
>
>
> On Thu, Oct 17, 2013 at 5:22 AM, Roderick Mooi
> <>
> wrote:
> Hi Alan, Eli
>
> I'm not seeing fluctuations in "good" or "bad" measurements.
>
> Good:
>
> RECEIVER START
> bwctl: exec_line: iperf -B 196.21.48.249 -s -f m -m -p 5152 -t 20 -i 1
> bwctl: start_tool: 3590989992.044877
> ------------------------------------------------------------
> Server listening on TCP port 5152
> Binding to local address 196.21.48.249
> TCP window size: 0.08 MByte (default)
> ------------------------------------------------------------
> [ 14] local 196.21.48.249 port 5152 connected with 192.96.2.247 port 5152
> [ ID] Interval Transfer Bandwidth
> [ 14] 0.0- 1.0 sec 111 MBytes 929 Mbits/sec
> [ 14] 1.0- 2.0 sec 112 MBytes 941 Mbits/sec
> [ 14] 2.0- 3.0 sec 112 MBytes 942 Mbits/sec
> [ 14] 3.0- 4.0 sec 112 MBytes 941 Mbits/sec
> [ 14] 4.0- 5.0 sec 112 MBytes 941 Mbits/sec
> [ 14] 5.0- 6.0 sec 112 MBytes 941 Mbits/sec
> [ 14] 6.0- 7.0 sec 112 MBytes 941 Mbits/sec
> [ 14] 7.0- 8.0 sec 112 MBytes 941 Mbits/sec
> [ 14] 8.0- 9.0 sec 112 MBytes 941 Mbits/sec
> [ 14] 9.0-10.0 sec 112 MBytes 941 Mbits/sec
> [ 14] 10.0-11.0 sec 112 MBytes 942 Mbits/sec
> [ 14] 11.0-12.0 sec 112 MBytes 941 Mbits/sec
> [ 14] 12.0-13.0 sec 112 MBytes 941 Mbits/sec
> [ 14] 13.0-14.0 sec 112 MBytes 941 Mbits/sec
> [ 14] 14.0-15.0 sec 112 MBytes 942 Mbits/sec
> [ 14] 15.0-16.0 sec 112 MBytes 941 Mbits/sec
> [ 14] 16.0-17.0 sec 112 MBytes 941 Mbits/sec
> [ 14] 17.0-18.0 sec 112 MBytes 941 Mbits/sec
> [ 14] 18.0-19.0 sec 112 MBytes 941 Mbits/sec
> [ 14] 19.0-20.0 sec 112 MBytes 941 Mbits/sec
> [ 14] 0.0-20.5 sec 2298 MBytes 941 Mbits/sec
> [ 14] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
> bwctl: stop_exec: 3590990016.831918
>
> RECEIVER END
>
>
> Bad:
>
> RECEIVER START
> bwctl: exec_line: iperf -B 196.21.48.249 -s -f m -m -p 5149 -t 20 -i 1
> bwctl: start_tool: 3590989696.322229
> ------------------------------------------------------------
> Server listening on TCP port 5149
> Binding to local address 196.21.48.249
> TCP window size: 0.08 MByte (default)
> ------------------------------------------------------------
> [ 14] local 196.21.48.249 port 5149 connected with 192.96.2.247 port 5149
> [ ID] Interval Transfer Bandwidth
> [ 14] 0.0- 1.0 sec 12.8 MBytes 107 Mbits/sec
> [ 14] 1.0- 2.0 sec 11.1 MBytes 93.3 Mbits/sec
> [ 14] 2.0- 3.0 sec 13.9 MBytes 116 Mbits/sec
> [ 14] 3.0- 4.0 sec 18.1 MBytes 152 Mbits/sec
> [ 14] 4.0- 5.0 sec 14.7 MBytes 124 Mbits/sec
> [ 14] 5.0- 6.0 sec 16.1 MBytes 135 Mbits/sec
> [ 14] 6.0- 7.0 sec 14.9 MBytes 125 Mbits/sec
> [ 14] 7.0- 8.0 sec 10.3 MBytes 86.3 Mbits/sec
> [ 14] 8.0- 9.0 sec 16.6 MBytes 139 Mbits/sec
> [ 14] 9.0-10.0 sec 19.7 MBytes 165 Mbits/sec
> [ 14] 10.0-11.0 sec 15.0 MBytes 126 Mbits/sec
> [ 14] 11.0-12.0 sec 21.2 MBytes 178 Mbits/sec
> [ 14] 12.0-13.0 sec 13.3 MBytes 112 Mbits/sec
> [ 14] 13.0-14.0 sec 12.2 MBytes 102 Mbits/sec
> [ 14] 14.0-15.0 sec 12.7 MBytes 107 Mbits/sec
> [ 14] 15.0-16.0 sec 10.9 MBytes 91.2 Mbits/sec
> [ 14] 16.0-17.0 sec 10.9 MBytes 91.6 Mbits/sec
> [ 14] 17.0-18.0 sec 13.5 MBytes 114 Mbits/sec
> [ 14] 18.0-19.0 sec 11.7 MBytes 97.8 Mbits/sec
> [ 14] 19.0-20.0 sec 12.0 MBytes 100 Mbits/sec
> [ 14] 0.0-20.1 sec 282 MBytes 118 Mbits/sec
> [ 14] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
> bwctl: stop_exec: 3590989721.229266
>
> RECEIVER END
>
> Referring to my complementary email to Brian and Ivan, do you have further
> suggestions?
>
> " I'm still puzzled by the fact that all tests between these 2 nodes and
> others nodes on the network path are fine (i.e. I don't see this up-down
> behaviour).
> [see:
> http://perfsonara.sanren.ac.za/serviceTest/index.cgi?eventType=bwctl
> and
> https://192.96.2.247/serviceTest/index.cgi?eventType=bwctl
> ]
> "
>
> Thanks!
>
> Roderick
>
> >>> On 2013-10-16 at 19:07, Eli Dart
> >>> <>
> >>> wrote:
>
> >
> > On 10/16/13 9:46 AM, Alan Whinery wrote:
> >> You might also reveal something useful by using periodic reports in your
> >> bwctl invocations (like " -i 1 ") you may find that the per second
> >> reports show burstiness, or the lack of it.
> >
> > I find this to be very very helpful.
> >
> > There is a big difference between a clean ramp to a stable speed, and
> > wild fluctuation that is averaged.
> >
> > A clean ramp to a stable speed argues against the presence of packet
> > loss. If performance is poor but stable, I would check the hosts and
> > the application, and then check for a clean bottleneck link. Wild
> > fluctuation points toward loss - check your router and switch buffers.
> > (And if "poor but stable" means fluctuating between 10Kbps and 30Kbps,
> > there is probably loss too :)
> >
> > None of this is set in stone of course. However, I find that telling
> > bwctl to give periodic reports is very helpful indeed.
> >
> > --eli
> >
> >
> >>
> >> On 10/16/2013 6:26 AM, Wefel, Paul wrote:
> >>> Couple ideas
> >>>
> >>> Run owamp between these two hosts looking for packet loss in only one
> >>> direction.
> >>> Check the switch interface that Dst is connected to looking for queue
> >>> drops and pause frames being sent.
> >>>
> >>> I have also seen strange issues with some NICS when offloading is
> >>> enabled
> >>> on them.
> >>>
> >>> good luck, let us know what you find.
> >>>
> >>> -paul
> >>> NCSA @ UIUC
> >>>
> >>> -----Original Message-----
> >>> From: Roderick Mooi
> >>> <>
> >>> Date: Wednesday, October 16, 2013 5:07 AM
> >>> To:
> >>> ""
> >>> <>,
> >>>
> >>> ""
> >>> <>
> >>> Subject: [perfsonar-user] Help with inconsistent bwctl measurements
> >>>
> >>>> Hi
> >>>>
> >>>> I have been trying to locate the cause of inconsistent measurements
> >>>> between two nodes for a few weeks now without success. The pattern I'm
> >>>> seeing is available at:
> >>>>
> >>>> https://192.96.2.247/serviceTest/bandwidthGraph.cgi?url=http://localhost:8
> >>>> 085/perfSONAR_PS/services/pSB&key=d9013ce7df20b8bbe45defeaeae785d6&keyR=0a
> >>>> 0ed6c928edf28976414a2cc7e87d6f&dstIP=192.96.2.247&srcIP=196.21.48.249&dst=
> >>>> 192.96.2.247&src=perfsonara.sanren.ac.za&type=TCP&length=2592000
> >>>>
> >>>> Src-Dst is consistent but Dst-Src is not.
> >>>>
> >>>> Manual tests (attached) show the same behaviour without any indication
> >>>> of
> >>>> cause - measures 941 Mbps then drops to 189 Mbps (end) and back to 941
> >>>> (nothing different in the logs between "good" measurements and "bad"
> >>>> ones). The only time I've seen something similar is when I was testing
> >>> >from a 10 G interface to a 1 G interface which was subsequently being
> >>>> flooded. In this case both interfaces are 1 G. I'm also not seeing any
> >>>> problems with measurements along the path or between these nodes and
> >>>> any
> >>>> other nodes. Additionally, there is very little (< 50 Mbps) real
> >>>> traffic
> >>>> between these 2 nodes.
> >>>>
> >>>> Any ideas?
> >>>>
> >>>> Thanks!
> >>>>
> >>>> Roderick
> >>>>
> >>>> --
> >>>> Roderick Mooi | SANREN Engineer
> >>>> --
> >>>>
> >>>> | +27 12 841 4111 | www.sanren.ac.za
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> This message is subject to the CSIR's copyright terms and conditions,
> >>>> e-mail legal notice, and implemented Open Document Format (ODF)
> >>>> standard.
> >>>> The full disclaimer details can be found at
> >>>> http://www.csir.co.za/disclaimer.html.
> >>>>
> >>>> This message has been scanned for viruses and dangerous content by
> >>>> MailScanner,
> >>>> and is believed to be clean.
> >>>>
> >>>> Please consider the environment before printing this email.
> >>>>
> >>>
> >>
> >
> > --
> > Eli Dart, Network Engineer NOC: (510) 486-7600
> > ESnet Office of the CTO (AS293) (800) 333-7638
> > Lawrence Berkeley National Laboratory
> > PGP Key fingerprint = C970 F8D3 CFDD 8FFF 5486 343A 2D31 4478 5F82 B2B3
> >
> > --
> > This message is subject to the CSIR's copyright terms and conditions,
> > e-mail
> > legal notice, and implemented Open Document Format (ODF) standard.
> > The full disclaimer details can be found at
> > http://www.csir.co.za/disclaimer.html.
> >
> > This message has been scanned for viruses and dangerous content by
> > MailScanner,
> > and is believed to be clean.
> >
> > Please consider the environment before printing this email.
>
> --
> This message is subject to the CSIR's copyright terms and conditions,
> e-mail legal notice, and implemented Open Document Format (ODF) standard.
> The full disclaimer details can be found at
> http://www.csir.co.za/disclaimer.html.
>
> This message has been scanned for viruses and dangerous content by
> MailScanner,
> and is believed to be clean.
>
> Please consider the environment before printing this email.



Archive powered by MHonArc 2.6.16.

Top of Page