perfsonar-user - Re: [perfsonar-user] pscheduler throughput test fails to complete after pS software upgrade
Subject: perfSONAR User Q&A and Other Discussion
List archive
Re: [perfsonar-user] pscheduler throughput test fails to complete after pS software upgrade
Chronological Thread
- From: "Uhl, George D. (GSFC-423.0)[SGT INC]" <>
- To: Mark Feit <>, "" <>
- Subject: Re: [perfsonar-user] pscheduler throughput test fails to complete after pS software upgrade
- Date: Tue, 30 Apr 2019 14:31:23 +0000
- Dkim-filter: OpenDKIM Filter v2.11.0 ndjsvnpf104.ndc.nasa.gov 472B0400A3C7
Hi Mark,
I finally got back to looking at this again and figured out what was going on. The remote non-agent host has an iperf3 daemon running on TCP/5201 as a separate process. When pscheduler spawns the test, the throughput test is run against the iperf3 server rather than spawning a new daemon. Once the throughput test is complete, pscheduler is unable to terminate the independently running iperf3 server and the pscheduler exits with an iperf3 error.
Would you consider this a bug or a feature?
-George
From: "George.D.Uhl" <>
Hi Mark,
I’m resurrecting this issue for additional insight. This problem has been cropping up on other iperf3 throughput tests in my mesh when iperf3 streams are sourced from my esdis-ps2-10g.eos.nasa.gov node. I did contact the administrator at USGS responsible for the edclxw41.cr.usgs.gov node. I’m running iperf3.6 on my node and the USGS node is iperf3.5.
[uhl@enpl-pt2-10g ~]$ iperf3 -v iperf 3.6 (cJSON 1.5.2) Linux enpl-pt2-10g.eos.nasa.gov 3.10.0-957.1.3.el7.x86_64 #1 SMP Thu Nov 29 14:49:43 UTC 2018 x86_64 Optional features available: CPU affinity setting, IPv6 flow label, TCP congestion algorithm setting, sendfile / zerocopy, socket pacing, authentication
[rech@edclxw41 ~]$ iperf3 -v iperf 3.5 (cJSON 1.5.2) Linux edclxw41 2.6.32-754.6.3.el6.x86_64 #1 SMP Tue Oct 9 17:27:49 UTC 2018 x86_64 Optional features available: CPU affinity setting, IPv6 flow label, TCP congestion algorithm setting, sendfile / zerocopy, authentication
We’re still getting iperf3 test completion errors when running tests via pscheduler. However when I run standalone iperf3 tests from my node, the tests complete without reporting an issue.
[uhl@enpl-pt2-10g ~]$ iperf3 -c edclxw41.cr.usgs.gov Connecting to host edclxw41.cr.usgs.gov, port 5201 [ 5] local 169.154.197.28 port 48914 connected to 152.61.6.5 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 87.2 MBytes 731 Mbits/sec 679 3.30 MBytes [ 5] 1.00-2.00 sec 105 MBytes 881 Mbits/sec 0 3.35 MBytes [ 5] 2.00-3.00 sec 104 MBytes 870 Mbits/sec 0 3.44 MBytes [ 5] 3.00-4.00 sec 109 MBytes 912 Mbits/sec 0 3.68 MBytes [ 5] 4.00-5.00 sec 119 MBytes 996 Mbits/sec 0 4.06 MBytes [ 5] 5.00-6.00 sec 135 MBytes 1.13 Gbits/sec 0 4.60 MBytes [ 5] 6.00-7.00 sec 154 MBytes 1.29 Gbits/sec 0 5.28 MBytes [ 5] 7.00-8.00 sec 175 MBytes 1.47 Gbits/sec 0 6.13 MBytes [ 5] 8.00-9.00 sec 204 MBytes 1.71 Gbits/sec 0 7.14 MBytes [ 5] 9.00-10.00 sec 239 MBytes 2.00 Gbits/sec 0 8.33 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 1.40 GBytes 1.20 Gbits/sec 679 sender [ 5] 0.00-10.03 sec 1.38 GBytes 1.18 Gbits/sec receiver
iperf Done. [uhl@enpl-pt2-10g ~]$
The USGS test node will be replaced with a more current release of perfSONAR/CentOS but in the meantime I’m seeing these same failures occurring with iperf3 tests to other (I assume older) nodes. I also gave pscheduler/iperf2 a try and that also failed to complete I’m unable to test standalone iperf2 since there is no daemon running at USGS.
[uhl@enpl-pt2-10g ~]$ pscheduler task --tool iperf2 throughput --source enpl-pt2-10g.eos.nasa.gov --dest edclxw41.cr.usgs.gov Submitting task... Task URL: https://enpl-pt2-10g.eos.nasa.gov/pscheduler/tasks/d0886c96-9328-434d-af6e-6a669d78b261 Running with tool 'iperf2' Fetching first run...
Next scheduled run: https://enpl-pt2-10g.eos.nasa.gov/pscheduler/tasks/d0886c96-9328-434d-af6e-6a669d78b261/runs/a8f06a52-be3d-4033-83d9-c330e50defc8 Starts 2019-04-05T10:21:39-04:00 (~50 seconds) Ends 2019-04-05T10:21:54-04:00 (~14 seconds) Waiting for result...
Run did not complete: Failed
Diagnostics from enpl-pt2-10g.eos.nasa.gov: /usr/bin/iperf -p 5001 -c edclxw41.cr.usgs.gov -t 10 -m
------------------------------------------------------------ Client connecting to edclxw41.cr.usgs.gov, TCP port 5001 TCP window size: 325 KByte (default) ------------------------------------------------------------ [ 3] local 169.154.197.28 port 50002 connected with 152.61.6.5 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 2.66 GBytes 2.28 Gbits/sec [ 3] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
Error from enpl-pt2-10g.eos.nasa.gov: No error.
Diagnostics from edclxw41.cr.usgs.gov: No result was produced
Error from edclxw41.cr.usgs.gov: No result was produced
No further runs scheduled. [uhl@enpl-pt2-10g ~]$
Any insight/suggestions appreciated.
Thanks, George
From: Mark Feit <>
"Uhl, George D. (GSFC-423.0)[SGT INC]" writes:
One of my managed test nodes underwent a perfsonar software upgrade on Saturday morning. Ever since the upgrade, outbound iperf3 throughput tests fail to complete. The destination is a no-agent test node which I think might be running a perfsonar 4.0.x release. …
Diagnostics from edclxw41.cr.usgs.gov: No diagnostics.
Error from edclxw41.cr.usgs.gov: iperf3 returned an error: exiting
It looks like iperf3 failed at the far end. Earlier versions of the iperf3 plugin don’t collect sufficient diagnostic information when the tool fails, so getting to the bottom of this will require some cooperation from USGS. (Looking at the current sources, we may need to re-think some of how that’s done.) The failure should have left traces in the logs and, if not, turning debugging on for a few minutes will get good information.
Your end appears to have produced a usable result, which would seem to indicate that USGS worked well enough to produce it and died on the way out. Iperf3 can sometimes be cagey about why it failed. Bruce can correct me if I’m wrong, but I seem to recall that parts of the error end up on different output streams.
I can’t reach pScheduler on the USGS machine (but I can reach the BWCTL and iperf3 servers) and was able to detect that the system is running iperf3 3.6. That is the current version, released last June, so they’re not that far behind.
--Mark
|
- Re: [perfsonar-user] pscheduler throughput test fails to complete after pS software upgrade, Uhl, George D. (GSFC-423.0)[SGT INC], 04/05/2019
- Re: [perfsonar-user] pscheduler throughput test fails to complete after pS software upgrade, Uhl, George D. (GSFC-423.0)[SGT INC], 04/30/2019
Archive powered by MHonArc 2.6.19.