Hi,
I’m one of the users testing to the NOAA perfSONAR test node. I’ve been specifying the “—ip-version 4” switch on the throughput tasks and I’ve just included it with a “troubleshoot”
task. In both cases the tests are failing despite that TCP/443 communication works fine. The NOAA test node is a no-agent host in one of my test meshes that include a number of no-agent test hosts and it’s the only one experiencing this problem.
Thanks,
George Uhl
$ pscheduler troubleshoot --ip-version 4
nettest.boulder.noaa.gov
Performing basic troubleshooting of localhost and
nettest.boulder.noaa.gov.
localhost:
Checking path MTU... 65535 (Local)
Checking for pScheduler... OK.
Checking clock... OK.
Idle test.... 13 seconds.... Checking archiving... OK.
nettest.boulder.noaa.gov:
Checking path MTU... 1500+
Checking for pScheduler... OK.
Checking clock... OK.
Idle test.... 13 seconds.... Checking archiving... OK.
localhost and
nettest.boulder.noaa.gov:
Checking path MTU... 1500+
Checking timekeeping... OK.
Simple stream test.... 13 seconds.... Failed.
Task failed to run properly.
2019-05-30T10:09:01-04:00 on localhost and
nettest.boulder.noaa.gov with simplestreamer:
simplestream --dest
nettest.boulder.noaa.gov --ip-version 4
Run did not complete: Failed
Diagnostics from localhost:
Try 1 failed: Failed to connect: [Errno 111] Connection refused
Try 2 failed: Failed to connect: [Errno 111] Connection refused
Try 3 failed: Failed to connect: [Errno 111] Connection refused
Try 4 failed: Failed to connect: [Errno 111] Connection refused
Try 5 failed: Failed to connect: [Errno 111] Connection refused
Try 6 failed: Failed to connect: [Errno 111] Connection refused
Try 7 failed: Failed to connect: [Errno 111] Connection refused
Try 8 failed: Failed to connect: [Errno 111] Connection refused
Try 9 failed: Failed to connect: [Errno 111] Connection refused
Try 10 failed: Failed to connect: [Errno 111] Connection refused
Error from localhost:
Failed to connect: [Errno 111] Connection refused
Diagnostics from
nettest.boulder.noaa.gov:
Nothing to see at the receiving end.
Error from
nettest.boulder.noaa.gov:
Timed out
[uhl@enpl-pt2-10g ~]$
$ pscheduler task --debug throughput --source
enpl-pt2-10g.eos.nasa.gov --dest
nettest.boulder.noaa.gov
--ip-version 4
2019-05-30T14:15:18 Debug started
2019-05-30T14:15:18 Assistance is from localhost
2019-05-30T14:15:18 Forcing default slip of PT5M
2019-05-30T14:15:18 Converting to spec via
https://localhost/pscheduler/tests/throughput/spec
Submitting task...
2019-05-30T14:15:18 Fetching participant list
2019-05-30T14:15:18 Spec is: {"dest": "nettest.boulder.noaa.gov",
"source": "enpl-pt2-10g.eos.nasa.gov", "ip-version": 4, "schema": 1}
2019-05-30T14:15:18 Params are: {'spec': '{"dest": "nettest.boulder.noaa.gov",
"source": "enpl-pt2-10g.eos.nasa.gov", "ip-version": 4, "schema": 1}'}
2019-05-30T14:15:18 Got participants: {u'participants': [u'enpl-pt2-10g.eos.nasa.gov', u'nettest.boulder.noaa.gov']}
2019-05-30T14:15:18 Lead is
enpl-pt2-10g.eos.nasa.gov
2019-05-30T14:15:18 Pinging
https://enpl-pt2-10g.eos.nasa.gov/pscheduler/
2019-05-30T14:15:18
enpl-pt2-10g.eos.nasa.gov is up
2019-05-30T14:15:18 Posting task to
https://enpl-pt2-10g.eos.nasa.gov/pscheduler/tasks
2019-05-30T14:15:18 Data is {"test": {"type": "throughput", "spec": {"dest": "nettest.boulder.noaa.gov",
"source": "enpl-pt2-10g.eos.nasa.gov", "ip-version": 4, "schema": 1}}, "schema": 1, "schedule": {"slip": "PT5M"}}
Task URL:
https://enpl-pt2-10g.eos.nasa.gov/pscheduler/tasks/2207091b-b376-4c16-8e29-6d5c9a3e36c7
2019-05-30T14:15:35 Posted
https://enpl-pt2-10g.eos.nasa.gov/pscheduler/tasks/2207091b-b376-4c16-8e29-6d5c9a3e36c7
2019-05-30T14:15:35 Submission diagnostics:
2019-05-30T14:15:35
Hints:
2019-05-30T14:15:35
requester: 169.154.197.28
2019-05-30T14:15:35
server: 169.154.197.28
2019-05-30T14:15:35
Identified as everybody, local-interfaces
2019-05-30T14:15:35
Classified as default, friendlies
2019-05-30T14:15:35
Application: Hosts we trust to do everything
2019-05-30T14:15:35
Group 1: Limit 'always' passed
2019-05-30T14:15:35
Group 1: Want all, 1/1 passed, 0/1 failed: PASS
2019-05-30T14:15:35
Application PASSES
2019-05-30T14:15:35
Application: Defaults applied to non-friendly hosts
2019-05-30T14:15:35
Group 1: Limit 'innocuous-tests' failed: Passed but inverted
2019-05-30T14:15:35
Group 1: Limit 'throughput-default-time' passed
2019-05-30T14:15:35
Group 1: Limit 'idleex-default' failed: Test is not 'idleex'
2019-05-30T14:15:35
Group 1: Want any, 1/3 passed, 2/3 failed: PASS
2019-05-30T14:15:35
Application PASSES
2019-05-30T14:15:35
Proposal meets limits
Running with tool 'iperf3'
Fetching first run...
2019-05-30T14:15:35 Fetching
https://enpl-pt2-10g.eos.nasa.gov/pscheduler/tasks/2207091b-b376-4c16-8e29-6d5c9a3e36c7/runs/first
2019-05-30T14:15:36 Handing off: pscheduler watch --first --format text/plain --debug
https://enpl-pt2-10g.eos.nasa.gov/pscheduler/tasks/2207091b-b376-4c16-8e29-6d5c9a3e36c7
2019-05-30T14:15:36 Debug started
2019-05-30T14:15:36 Fetching
https://enpl-pt2-10g.eos.nasa.gov/pscheduler/tasks/2207091b-b376-4c16-8e29-6d5c9a3e36c7
2019-05-30T14:15:36 Fetching next run from
https://enpl-pt2-10g.eos.nasa.gov/pscheduler/tasks/2207091b-b376-4c16-8e29-6d5c9a3e36c7/runs/first
Next scheduled run:
https://enpl-pt2-10g.eos.nasa.gov/pscheduler/tasks/2207091b-b376-4c16-8e29-6d5c9a3e36c7/runs/745d9145-65f3-4b4d-804f-2bf1fe1b12b3
Starts 2019-05-30T10:16:05-04:00 (~28 seconds)
Ends
2019-05-30T10:16:24-04:00 (~18 seconds)
Waiting for result...
Run did not complete: Failed
Diagnostics from
enpl-pt2-10g.eos.nasa.gov:
No diagnostics.
Error from
enpl-pt2-10g.eos.nasa.gov:
iperf3 returned an error: error - unable to connect to server: Connection refused
Diagnostics from
nettest.boulder.noaa.gov:
No diagnostics.
Error from
nettest.boulder.noaa.gov:
iperf3 returned an error:
Process took too long to run.
2019-05-30T14:16:24 Fetching next run from
https://enpl-pt2-10g.eos.nasa.gov/pscheduler/tasks/2207091b-b376-4c16-8e29-6d5c9a3e36c7/runs/next
No further runs scheduled.
From:
<> on behalf of Mark Feit <>
Reply-To: Mark Feit <>
Date: Wednesday, May 29, 2019 at 10:59 AM
To: Alex Hsia <>
Cc: "" <>
Subject: Re: [perfsonar-user] Problems with Debian pscheduler
Alex Hsia writes:
There shouldn't be any firewall blocking access to simplestream and the host is using the default perfSONAR default host level firewall rules, i.e.:
I did some additional poking around and found the cause: the hosts involved have mixed IP stacks.
Nettest is single-stack, so its FQDN has an A record only; sdmz-perfsonar-40g is dual-stack and has both. Because most flavors of Linux will prefer IPv6 if available, sdmz’s FQDN
will get an AAAA record first and the listening socket opened will be IPv6. Nettest won’t be able to see that. You can force your way around this by adding the “--ip-version 4” switch when running the troubleshooter or specifying IPv4 addresses explicitly.
The simplestream test and simplestreamer tool were originally written for use during development, but it’s become clear the diagnostics it produces aren’t as helpful as they could
be to end users. While I’m at it, I’ll probably make some adjustments to the troubleshooter to spot this situation and either warn about it or take steps to avoid it. I’ve opened a couple tickets on these issues and should have enhancements out with 4.2.0:
https://github.com/perfsonar/pscheduler/issues/850 and
https://github.com/perfsonar/pscheduler/issues/851.
--Mark