Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] throughput host missing test graphs

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] throughput host missing test graphs


Chronological Thread 
  • From: Trey Dockendorf <>
  • To: perfsonar-user <>
  • Subject: Re: [perfsonar-user] throughput host missing test graphs
  • Date: Tue, 27 Jan 2015 19:50:45 -0600

Thought I'd mention this issue is now resolved.  Once I correctly set the MTU on the switch for these hosts all the test data started showing up.

Thanks,
- Trey

=============================

Trey Dockendorf 
Systems Analyst I 
Texas A&M University 
Academy for Advanced Telecommunications and Learning Technologies 
Phone: (979)458-2396 
Email:  
Jabber:

On Tue, Jan 20, 2015 at 4:28 PM, Trey Dockendorf <> wrote:
Jason,

Thanks, so far the majority of hosts give "Invalid protocol message received" with iperf3.

Trying iperf as the tool used I get "local tool did not complete in allocated time frame and was killed" [1].  I tried increasing test interval to 30 but that doesn't seem like the right tuning knob to change the allocated time frame.

The nuttcp seems to throw error "Bad file descriptor" and produces no measurements.  [2].

So far most of the hosts I'm testing against are throwing these errors.  Unsure if this issue on my end or the remote end.

Exceptions to the errors above were hosts that are clearly down and ps1-hardy-hstn.tx-learn.net which failed in the same ways except using nuttcp and only when using the "-s" flag.

[1]:

$ bwctl -T iperf -t 20 -i 1 -f m -x -vv -c ps.tacc.utexas.edu
Messages being sent to syslog(user,err)
bwctl[555]: FILE=bwctl.c, LINE=2961, Using 165.91.55.6 as the address for local sender
bwctl[555]: FILE=bwctl.c, LINE=2961, Using ps.tacc.utexas.edu as the address for remote receiver
bwctl[555]: FILE=bwctl.c, LINE=3008, Available in-common: iperf nuttcp iperf3
bwctl[555]: FILE=bwctl.c, LINE=3055, Using tool: iperf
bwctl[555]: FILE=bwctl.c, LINE=3150, Requested Time: 1421780927.581861
bwctl[555]: FILE=bwctl.c, LINE=3152, Latest Acceptable Time: 1421781527.581861
bwctl[555]: FILE=bwctl.c, LINE=3332, Reservation(ps.tacc.utexas.edu): 1421780933.303169
bwctl[555]: FILE=bwctl.c, LINE=3189, Server 'ps.tacc.utexas.edu' accepted test request at time 1421780933.303169
bwctl[555]: FILE=bwctl.c, LINE=3332, Reservation(localhost): 1421780933.303169
bwctl[555]: FILE=bwctl.c, LINE=3218, Client 'localhost' accepted test request at time 1421780933.303169
bwctl[555]: FILE=bwctl.c, LINE=3450, 27 seconds until test results available

RECEIVER START
bwctl: start_endpoint: 3630769727.276532
bwctl: run_endpoint: receiver: 129.114.0.189
bwctl: run_endpoint: sender: 165.91.55.6
bwctl: exec_line: iperf -B 129.114.0.189 -s -f m -m -p 5085 -t 20 -i 1.000000
bwctl: run_tool: tester: iperf
bwctl: run_tool: receiver: 129.114.0.189
bwctl: run_tool: sender: 165.91.55.6
bwctl: start_tool: 3630769730.794569
------------------------------------------------------------
Server listening on TCP port 5085
Binding to local address 129.114.0.189
TCP window size: 0.08 MByte (default)
------------------------------------------------------------
[ 15] local 129.114.0.189 port 5085 connected with 165.91.55.6 port 53366
Waiting for server threads to complete. Interrupt again to force quit.

bwctl: local tool did not complete in allocated time frame and was killed
bwctl: stop_tool: 3630769755.812094
bwctl: stop_endpoint: 3630769756.833291

RECEIVER END

SENDER START
bwctl: start_endpoint: 3630769727.281346
bwctl: run_endpoint: receiver: 129.114.0.189
bwctl: run_endpoint: sender: 165.91.55.6
bwctl: exec_line: iperf -c 129.114.0.189 -B 165.91.55.6 -f m -m -p 5085 -t 20 -i 1.000000
bwctl: run_tool: tester: iperf
bwctl: run_tool: receiver: 129.114.0.189
bwctl: run_tool: sender: 165.91.55.6
bwctl: start_tool: 3630769733.303298
------------------------------------------------------------
Client connecting to 129.114.0.189, TCP port 5085
Binding to local address 165.91.55.6
TCP window size: 0.09 MByte (default)
------------------------------------------------------------
[  8] local 165.91.55.6 port 53366 connected with 129.114.0.189 port 5085

bwctl: local tool did not complete in allocated time frame and was killed
bwctl: stop_tool: 3630769755.825698
bwctl: stop_endpoint: 3630769756.827477

SENDER END

[2]: 

$ bwctl -T nuttcp -t 20 -i 1 -f m -x -vv -c ps.tacc.utexas.edu
Messages being sent to syslog(user,err)
bwctl[5045]: FILE=bwctl.c, LINE=2961, Using 165.91.55.6 as the address for local sender
bwctl[5045]: FILE=bwctl.c, LINE=2961, Using ps.tacc.utexas.edu as the address for remote receiver
bwctl[5045]: FILE=bwctl.c, LINE=3008, Available in-common: iperf nuttcp iperf3
bwctl[5045]: FILE=bwctl.c, LINE=3055, Using tool: nuttcp
bwctl[5045]: FILE=bwctl.c, LINE=3150, Requested Time: 1421781241.364543
bwctl[5045]: FILE=bwctl.c, LINE=3152, Latest Acceptable Time: 1421781841.364543
bwctl[5045]: FILE=bwctl.c, LINE=3332, Reservation(ps.tacc.utexas.edu): 1421781247.724141
bwctl[5045]: FILE=bwctl.c, LINE=3189, Server 'ps.tacc.utexas.edu' accepted test request at time 1421781247.724141
bwctl[5045]: FILE=bwctl.c, LINE=3332, Reservation(localhost): 1421781247.724141
bwctl[5045]: FILE=bwctl.c, LINE=3218, Client 'localhost' accepted test request at time 1421781247.724141
bwctl[5045]: FILE=bwctl.c, LINE=3450, 27 seconds until test results available

RECEIVER START
bwctl: start_endpoint: 3630770041.058892
bwctl: run_endpoint: receiver: 129.114.0.189
bwctl: run_endpoint: sender: 165.91.55.6
bwctl: exec_line: nuttcp -vv -p 5530 -P 5000 -i 1.000000 -T 20 --nofork -1
bwctl: run_tool: tester: nuttcp
bwctl: run_tool: receiver: 129.114.0.189
bwctl: run_tool: sender: 165.91.55.6
bwctl: start_tool: 3630770044.896131
nuttcp_mread: Bad file descriptor
bwctl: stop_tool: 3630770062.822761
bwctl: stop_endpoint: 3630770065.833646

RECEIVER END

SENDER START
bwctl: start_endpoint: 3630770041.063726
bwctl: run_endpoint: receiver: 129.114.0.189
bwctl: run_endpoint: sender: 165.91.55.6
bwctl: exec_line: nuttcp -vv -p 5530 -P 5000 -i 1.000000 -T 20 -t 129.114.0.189
bwctl: run_tool: tester: nuttcp
bwctl: run_tool: receiver: 129.114.0.189
bwctl: run_tool: sender: 165.91.55.6
bwctl: start_tool: 3630770047.724292
bwctl: stop_tool: 3630770065.827634
bwctl: stop_endpoint: 3630770065.827928

SENDER END


=============================

Trey Dockendorf 
Systems Analyst I 
Texas A&M University 
Academy for Advanced Telecommunications and Learning Technologies 
Phone: (979)458-2396 
Email:  
Jabber:

On Tue, Jan 20, 2015 at 1:05 PM, Jason Zurawski <> wrote:
Hey Trey;

Apologies, I fat fingered the instructions - try a lowercase ’s’ for that command to TACC (or just view the -help output from BWCTL to see the command line options).

With regards to the error, perhaps try using ‘-T iperf’ or ‘-T nuttcp’ to see if that has any change (e.g. changing the tools used).  The default testing tool is now iperf3, but seeing what the others say would be good.  As I noted in the previous mail, there are still some funny behaviors the developers are sorting out with iperf3 and BWCTL.

Thanks;

-jason

> On Jan 20, 2015, at 12:57 PM, Trey Dockendorf <> wrote:
>
> Jason,
>
> Thanks for the response.
>
> For #1 the second command doesn't work
>
> bwctl -T iperf3 -t 20 -i 1 -f m -x -vv -S ps.tacc.utexas.edu
>
> Gives me:
>
> bwctl: Invalid value for TOS. (-S)
>
> Unsure what is valid value would be.
>
> When I tried the first command against a remote host I got the following:
>
> # bwctl -T iperf3 -t 20 -i 1 -f m -x -vv -c ps.tacc.utexas.edu
> Messages being sent to syslog(user,err)
> bwctl[30659]: FILE=bwctl.c, LINE=2961, Using 165.91.55.6 as the address for local sender
> bwctl[30659]: FILE=bwctl.c, LINE=2961, Using ps.tacc.utexas.edu as the address for remote receiver
> bwctl[30659]: FILE=bwctl.c, LINE=3008, Available in-common: iperf nuttcp iperf3
> bwctl[30659]: FILE=bwctl.c, LINE=3055, Using tool: iperf3
> bwctl[30659]: FILE=bwctl.c, LINE=3150, Requested Time: 1421776247.770821
> bwctl[30659]: FILE=bwctl.c, LINE=3152, Latest Acceptable Time: 1421776847.770821
> bwctl[30659]: FILE=bwctl.c, LINE=3332, Reservation(ps.tacc.utexas.edu): 1421776254.700447
> bwctl[30659]: FILE=bwctl.c, LINE=3189, Server 'ps.tacc.utexas.edu' accepted test request at time 1421776254.700447
> bwctl[30659]: FILE=bwctl.c, LINE=3332, Reservation(localhost): 1421776254.700447
> bwctl[30659]: FILE=bwctl.c, LINE=3218, Client 'localhost' accepted test request at time 1421776254.700447
> bwctl[30659]: FILE=bwctl.c, LINE=3450, 28 seconds until test results available
> bwctl[30659]: FILE=protocol.c, LINE=506, BWLReadRequestType: Read interrupted by signal.
> bwctl[30659]: FILE=capi.c, LINE=1166, BWLEndSession: Invalid protocol message received...
> bwctl[30659]: FILE=bwctl.c, LINE=3546, Timed out waiting for results
>
> This particular endpoint is of interest at this time as another group on our campus has asked me to test against the remote site to investigate recent performance issues from their cluster to the remote site.
>
> I'll continue testing each host that's results are missing and see if errors are non obvious.
>
> Thanks,
> - Trey
>
> =============================
>
> Trey Dockendorf
> Systems Analyst I
> Texas A&M University
> Academy for Advanced Telecommunications and Learning Technologies
> Phone: (979)458-2396
> Email:
> Jabber:
>
> On Tue, Jan 20, 2015 at 10:14 AM, Jason Zurawski <> wrote:
> Hey Trey;
>
> Your mail is timely, we are trying to re-write some documentation on this.  This will be sort of scattered, but may be a enough to get you started:
>
> 1) Take a look at the hosts you have configured vs. the ones that are succeeding.  See if you can do a test by hand to one of the failures (e.g. on the cmd line: bwctl -T iperf3 -t 20 -i 1 -f m -x -vv -c HOST and bwctl -T iperf3 -t 20 -i 1 -f m -x -vv -S HOST).  If the host is dead, doesn’t support the requested test type, or denies your test in some way it is most likely a problem on the other end.  Maybe send an email to that admin to see what is up.
>
> 2) If the test from the first part succeeded, its time to look in the database.  In your case, open your favorite browser (preferably one with a JSON prettifier in place) to your esmond database:
>
> http://psonar-bwctl.brazos.tamu.edu/esmond/perfsonar/archive/?format=json
>
> Each test should have a corresponding record.  The ones that succeed, and the ones that fail.  Picking one that is failing for you (to ps1-akard-dlls.tx-learn.net - the LEARN Dallas node), we can pull up the failure event type:
>
> http://psonar-bwctl.brazos.tamu.edu/esmond/perfsonar/archive/7f8d5be4115b4927b7ce6a5dfbfa8e44/failures/base?format=json
>
> In this case we get a (useful?) error message:
>
> > Problem parsing output: malformed JSON string, neither array, object, number, string or atom, at character offset 0 (before "(end of string)") at /opt/perfsonar_ps/regular_testing/bin/../lib/perfSONAR_PS/RegularTesting/Parsers/Iperf3.pm line 54.
>
>
> This is a known problem right now between certain versions of BWCTL and iperf3 that the developers are working on.  Another common error:
>
> > interrupt - the client has terminated
>
>
> Typically means there was either a firewall or NTP issue.
>
> 3) Check your server side logs (/var/log/perfsonar) to see if anything is showing up in the bwctl_owamp log.  Common errors could be permission denied due to limits violations, the lack of being able to get a testing slot (common for busy perfSONAR nodes) or the aforementioned firewall issues.
>
> In general if your node is working for ‘some’ things, it may be on the healthy side.  Things that you are trying to test against may just not be reachable, or could be in need of an upgrade (another common issue we are seeing - until we can get a majority of the instances to 3.4.x).
>
> Hope this helps;
>
> -jason
>
> On Jan 20, 2015, at 10:54 AM, Trey Dockendorf <> wrote:
>
> > I have about 14 test members added for throughput tests and only 3 of them show up in the Throughput/Latency Graphs page.  I am unsure how to begin debugging this to find the cause of the missing data.
> >
> > The host is http://psonar-bwctl.brazos.tamu.edu/.  I have most of these sites also added on a separate perfsonar instance that does latency tests and that host is not missing data.
> >
> > Thanks,
> > - Trey
> >
> > =============================
> >
> > Trey Dockendorf
> > Systems Analyst I
> > Texas A&M University
> > Academy for Advanced Telecommunications and Learning Technologies
> > Phone: (979)458-2396
> > Email:
> > Jabber:





Archive powered by MHonArc 2.6.16.

Top of Page