Skip to Content.
Sympa Menu

perfsonar-user - RE: [perfsonar-user] Bandwidth system failing to run some tests

Subject: perfSONAR User Q&A and Other Discussion

List archive

RE: [perfsonar-user] Bandwidth system failing to run some tests


Chronological Thread 
  • From: "Garnizov, Ivan (RRZE)" <>
  • To: Trey Dockendorf <>
  • Cc: Sowmya Balasubramanian <>, perfsonar-user <>
  • Subject: RE: [perfsonar-user] Bandwidth system failing to run some tests
  • Date: Tue, 4 Aug 2015 16:52:31 +0000
  • Accept-language: en-GB, de-DE, en-US

Hi Trey,

 

Yes the congestion window has a huge impact on the TCP communications. I am not sure if you have other toolkits to verify  with and I am too lazy to calculate the maximum traffic your config allows and which will also require the RTT time with FNAL, but here is an example from a test between 2x 1G hosts (I believe yours were 10G)

 

bwctl: Using tool: iperf3

bwctl: 17 seconds until test results available

 

SENDER START

Connecting to host 2001:798:fc00:2c::6, port 5372

[ 15] local 2001:798:fc00:23::6 port 46317 connected to 2001:798:fc00:2c::6 port 5372

[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd

[ 15]   0.00-1.00   sec  21.5 MBytes   181 Mbits/sec    0   4.32 MBytes

[ 15]   1.00-2.00   sec   116 MBytes   975 Mbits/sec    0   6.84 MBytes

[ 15]   2.00-3.00   sec   118 MBytes   986 Mbits/sec    0   6.84 MBytes

[ 15]   3.00-4.00   sec   119 MBytes   996 Mbits/sec    0   6.84 MBytes

[ 15]   4.00-5.00   sec   118 MBytes   986 Mbits/sec    0   6.84 MBytes

[ 15]   5.00-6.00   sec   118 MBytes   986 Mbits/sec    0   6.84 MBytes

[ 15]   6.00-7.00   sec   118 MBytes   986 Mbits/sec    0   6.84 MBytes

[ 15]   7.00-8.00   sec   118 MBytes   986 Mbits/sec    0   6.84 MBytes

[ 15]   8.00-9.00   sec   119 MBytes   996 Mbits/sec    0   6.84 MBytes

[ 15]   9.00-10.00  sec   118 MBytes   986 Mbits/sec    0   6.84 MBytes

- - - - - - - - - - - - - - - - - - - - - - - - -

[ ID] Interval           Transfer     Bandwidth       Retr

[ 15]   0.00-10.00  sec  1.05 GBytes   906 Mbits/sec    0             sender

[ 15]   0.00-10.00  sec  1.05 GBytes   903 Mbits/sec                  receiver

 

iperf Done.

 

SENDER END

 

The Cwnd is a configuration on your system, which is then adjusted by the TCP protocol within certain limits.

 

BUT in fact even with your CWND size it seems that you should at least be getting 0.79 Mbits/sec, which you are not.

“[ 14]   0.00-1.00   sec  96.1 KBytes  0.79 Mbits/sec    2   26.2 Kbytes”

 

 

 

[2] about missing data….well I hope Andy will be able to guide you here.

 

Best regards,

Ivan

 

 

From: Trey Dockendorf [mailto:]
Sent: Dienstag, 4. August 2015 17:19
To: Garnizov, Ivan (RRZE)
Cc: Sowmya Balasubramanian; perfsonar-user
Subject: Re: [perfsonar-user] Bandwidth system failing to run some tests

 

Ivan,

 

Thanks for the response.  Does the low congestion window size indicate anything?  I'd like to try and rule out the issue being specific to my PerfSONAR host before going to my campus' networking group to begin seeing if there's networking issues causing problems.  We are seeing data transfer issues between my site and FNAL and I usually reference the PerfSONAR data to rule out networking issues.

 

The graphs I'm viewing are the toolkit measurements archive [1] on psonar-bwctl.brazos.tamu.edu.  I do not yet have MADDASH setup for my systems.  The graph for tests with FNAL [2] show things break around July 18th.  The graphs for Houston LEARN host show the "No data to plot" messages [3].  A couple weeks ago I used those graphs to identify a network issue occurring in early July.  So I know there was data at one point.

 

The IPs of my PerfSONAR hosts have not changed.  The only changes I've applied to these systems was on July 23rd when I updated to latest web100 kernel and applied other pending updates (including esmond).

 

Thanks,

- Trey

 


=============================

 

Trey Dockendorf 

Systems Analyst I 

Texas A&M University 

Academy for Advanced Telecommunications and Learning Technologies 

Phone: (979)458-2396 

Email:  

Jabber:

 

On Tue, Aug 4, 2015 at 8:44 AM, Garnizov, Ivan (RRZE) <> wrote:

Hi Trey,

 

[1] It is obvious here that the communication between the 2 endpoints was successful. In your results the disturbing value in fact is the congestion window size, which is extremely low.

 

 

[2] When you are reviewing the graphs, please select 1 month view and then use the link: “Previous 1m”. If there is no plot, then there is no data to plot.

In fact you are not telling us, which interface you are using – the one of MADDASH or the toolkit Measurements archive.

Have you changed IPs?

Please also check for the most ridiculous case, where the bandwidth  line is hidden behind the loss. Meaning there is 0 loss and almost 0 traffic. (MADDASH)

 

Best regards,

Ivan

 

 

 

From: [mailto:] On Behalf Of Trey Dockendorf
Sent: Montag, 3. August 2015 23:26
To: Sowmya Balasubramanian
Cc: perfsonar-user
Subject: Re: [perfsonar-user] Bandwidth system failing to run some tests

 

All tests were working at one point.  What's odd is an endpoint like ps1-hardy-hstn.tx-learn.net shows "ERROR: No data to plot for the hosts and time range selected." when I try to view the graph and clicking "1m" for a month of data shows nothing.

 

Command line tests [1] show what appears to be no bandwidth.  This is for a system with seemingly no data on graphs.  May be the graphs are showing the correct data which is 0Mbps.  Another host with the "No data to plot" on graphs also runs from command line [2] but with same 0Mbps.

 

If I start iperf3 on my latency host and run tests from my bandwidth system I get back nearly 10Gbps as expected.

 

Thanks,

- Trey

 

[1]: 

 

bwctl: Using tool: iperf3

bwctl: 16 seconds until test results available

 

SENDER START

Connecting to host 131.225.205.23, port 5726

[ 14] local 165.91.55.6 port 58249 connected to 131.225.205.23 port 5726

[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd

[ 14]   0.00-1.00   sec  96.1 KBytes  0.79 Mbits/sec    2   26.2 KBytes

[ 14]   1.00-2.00   sec  0.00 Bytes  0.00 Mbits/sec    1   26.2 KBytes

[ 14]   2.00-3.00   sec  0.00 Bytes  0.00 Mbits/sec    0   26.2 KBytes

[ 14]   3.00-4.00   sec  0.00 Bytes  0.00 Mbits/sec    1   26.2 KBytes

[ 14]   4.00-5.00   sec  0.00 Bytes  0.00 Mbits/sec    0   26.2 KBytes

[ 14]   5.00-6.00   sec  0.00 Bytes  0.00 Mbits/sec    0   26.2 KBytes

[ 14]   6.00-7.00   sec  0.00 Bytes  0.00 Mbits/sec    0   26.2 KBytes

[ 14]   7.00-8.00   sec  0.00 Bytes  0.00 Mbits/sec    1   26.2 KBytes

[ 14]   8.00-9.00   sec  0.00 Bytes  0.00 Mbits/sec    0   26.2 KBytes

[ 14]   9.00-10.00  sec  0.00 Bytes  0.00 Mbits/sec    0   26.2 KBytes

- - - - - - - - - - - - - - - - - - - - - - - - -

[ ID] Interval           Transfer     Bandwidth       Retr

[ 14]   0.00-10.00  sec  96.1 KBytes  0.08 Mbits/sec    5             sender

[ 14]   0.00-10.00  sec  0.00 Bytes  0.00 Mbits/sec                  receiver

 

iperf Done.

 

SENDER END

 

[2]:

# bwctl -T iperf3 -f m -t 10 -i 1 -c ps1-hardy-hstn.tx-learn.net

bwctl: Using tool: iperf3

bwctl: 37 seconds until test results available

 

SENDER START

Connecting to host 74.200.187.98, port 5579

[ 15] local 165.91.55.6 port 50987 connected to 74.200.187.98 port 5579

[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd

[ 15]   0.00-1.00   sec  87.4 KBytes  0.72 Mbits/sec    2   26.2 KBytes

[ 15]   1.00-2.00   sec  0.00 Bytes  0.00 Mbits/sec    1   26.2 KBytes

[ 15]   2.00-3.00   sec  0.00 Bytes  0.00 Mbits/sec    0   26.2 KBytes

[ 15]   3.00-4.00   sec  0.00 Bytes  0.00 Mbits/sec    1   26.2 KBytes

[ 15]   4.00-5.00   sec  0.00 Bytes  0.00 Mbits/sec    0   26.2 KBytes

[ 15]   5.00-6.00   sec  0.00 Bytes  0.00 Mbits/sec    0   26.2 KBytes

[ 15]   6.00-7.00   sec  0.00 Bytes  0.00 Mbits/sec    1   26.2 KBytes

[ 15]   7.00-8.00   sec  0.00 Bytes  0.00 Mbits/sec    0   26.2 KBytes

[ 15]   8.00-9.00   sec  0.00 Bytes  0.00 Mbits/sec    0   26.2 KBytes

[ 15]   9.00-10.00  sec  0.00 Bytes  0.00 Mbits/sec    0   26.2 KBytes

- - - - - - - - - - - - - - - - - - - - - - - - -

[ ID] Interval           Transfer     Bandwidth       Retr

[ 15]   0.00-10.00  sec  87.4 KBytes  0.07 Mbits/sec    5             sender

[ 15]   0.00-10.00  sec  0.00 Bytes  0.00 Mbits/sec                  receiver

 

iperf Done.

 

SENDER END

 

 

 


=============================

 

Trey Dockendorf 

Systems Analyst I 

Texas A&M University 

Academy for Advanced Telecommunications and Learning Technologies 

Phone: (979)458-2396 

Email:  

Jabber:

 

On Mon, Aug 3, 2015 at 3:32 PM, Sowmya Balasubramanian <> wrote:

Hi Trey,

 

Was the test working at some point? 

 

Can you try running a test from the command line from your host to the other host and send the results?

 

There is a possibility that the firewall rules or BWCTL limits on the other side is preventing your host from running the test.

 

Thanks,

Sowmya

 

On Mon, Aug 3, 2015 at 10:17 AM, Trey Dockendorf <> wrote:

I just discovered my bandwidth testing host is failing to run some of the configured tests.  I'm seeing errors like this in /var/log/perfsonar/regular_testing.log

 

2015/08/03 11:57:35 (21746) ERROR> MeasurementArchiveChild.pm:125 perfSONAR_PS::RegularTesting::Master::MeasurementArchiveChild::__ANON__ - Problem handling test results: Problem storing results: Error writing metadata: Error running test to psonar3.fnal.gov  with output bwctl: star

t_endpoint: 3634164453.980392

 

Attached is my regular_testing.log

 

 

The tests I was hoping to look at were against psonar3.fnal.gov.  I've noticed other tests are missing data too like tests to tx-learn.net hosts.  Tests to psnr-bw01.slac.stanford.edu is one that shows data but with lots of "red dots" at the top of the graphs.

 

I saw some mention of NTP problems in the logs so I forced NTP server updates via interface hoping to get closer NTP servers used.

 

# ntpq -p -c rv

     remote           refid      st t when poll reach   delay   offset  jitter

==============================================================================

-nms-rlat.chic.n 141.142.143.138  2 u   11   64  377   24.055   -3.209   0.407

*nms-rlat.hous.n .IRIG.           1 u    6   64  377   18.186    1.227   0.230

-nms-rlat.salt.n 128.138.140.44   2 u   65   64  377   33.024   -4.304   0.435

+time2.chpc.utah 198.60.22.240    2 u    8   64  377   34.591   -2.486   0.335

+time3.chpc.utah 198.60.22.240    2 u    2   64  377   34.534   -2.771   0.219

associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,

version="ntpd Mon Mar 16 14:53:03 UTC 2015 (1)",

processor="x86_64", system="Linux/2.6.32-504.30.3.el6.web100.x86_64",

leap=00, stratum=2, precision=-24, rootdelay=18.186, rootdisp=18.007,

refid=64.57.16.162,

reftime=d96a2001.60695f02  Mon, Aug  3 2015 12:14:41.376,

clock=d96a2090.760490a5  Mon, Aug  3 2015 12:17:04.461, peer=21380, tc=6,

mintc=3, offset=-0.105, frequency=41.314, sys_jitter=2.330,

clk_jitter=1.141, clk_wander=0.151

 

# ntpstat

synchronised to NTP server (64.57.16.162) at stratum 2

   time correct to within 27 ms

   polling server every 64 s

 

Let me know what other information would be useful to debug this.

 

Thanks,

- Trey


=============================

 

Trey Dockendorf 

Systems Analyst I 

Texas A&M University 

Academy for Advanced Telecommunications and Learning Technologies 

Phone: (979)458-2396 

Email:  

Jabber:

 

 

 




Archive powered by MHonArc 2.6.16.

Top of Page