Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] Problems with Debian pscheduler

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] Problems with Debian pscheduler


Chronological Thread 
  • From: Alex Hsia <>
  • To: "Uhl, George D. (GSFC-423.0)[SGT INC]" <>
  • Cc: Mark Feit <>, "" <>
  • Subject: Re: [perfsonar-user] Problems with Debian pscheduler
  • Date: Thu, 30 May 2019 08:32:30 -0600

George-

in my testing I had to specify the IPv4 address of my server explicitly since it has both an A and AAAA record in DNS.  I'll likely change my DNS such that the default hostname will only return an IPv4 address and I'll create a nettest-v6 AAAA record to try and help with this problem.

Alex Hsia ==============================================================
NOAA/OAR                                            Phone: (303)497-6351
Mailstop R/ESRL                                    GVoice: (303)536-5430
325 Broadway                                  e-mail:
Boulder, CO  80305                                   PGP keyid: 8A482A90
========================================================================



On Thu, May 30, 2019 at 8:18 AM Uhl, George D. (GSFC-423.0)[SGT INC] <> wrote:

Hi,

 

I’m one of the users testing to the NOAA perfSONAR test node.  I’ve been specifying the “—ip-version 4” switch on the throughput tasks and I’ve just included it with a “troubleshoot” task.  In both cases the tests are failing despite that TCP/443 communication works fine.  The NOAA test node is a no-agent host in one of my test meshes that include a number of no-agent test hosts and it’s the only one experiencing this problem.

 

Thanks,

George Uhl

 

$ pscheduler troubleshoot --ip-version 4 nettest.boulder.noaa.gov

Performing basic troubleshooting of localhost and nettest.boulder.noaa.gov.

 

localhost:

 

  Checking path MTU... 65535 (Local)

  Checking for pScheduler... OK.

  Checking clock... OK.

  Idle test.... 13 seconds.... Checking archiving... OK.

 

nettest.boulder.noaa.gov:

 

  Checking path MTU... 1500+

  Checking for pScheduler... OK.

  Checking clock... OK.

  Idle test.... 13 seconds.... Checking archiving... OK.

 

localhost and nettest.boulder.noaa.gov:

 

  Checking path MTU... 1500+

  Checking timekeeping... OK.

  Simple stream test.... 13 seconds.... Failed.

Task failed to run properly.

 

2019-05-30T10:09:01-04:00 on localhost and nettest.boulder.noaa.gov with simplestreamer:

 

simplestream --dest nettest.boulder.noaa.gov --ip-version 4

 

Run did not complete: Failed

 

 

Diagnostics from localhost:

  Try 1 failed: Failed to connect: [Errno 111] Connection refused

  Try 2 failed: Failed to connect: [Errno 111] Connection refused

  Try 3 failed: Failed to connect: [Errno 111] Connection refused

  Try 4 failed: Failed to connect: [Errno 111] Connection refused

  Try 5 failed: Failed to connect: [Errno 111] Connection refused

  Try 6 failed: Failed to connect: [Errno 111] Connection refused

  Try 7 failed: Failed to connect: [Errno 111] Connection refused

  Try 8 failed: Failed to connect: [Errno 111] Connection refused

  Try 9 failed: Failed to connect: [Errno 111] Connection refused

  Try 10 failed: Failed to connect: [Errno 111] Connection refused

 

Error from localhost:

  Failed to connect: [Errno 111] Connection refused

 

Diagnostics from nettest.boulder.noaa.gov:

  Nothing to see at the receiving end.

 

Error from nettest.boulder.noaa.gov:

  Timed out

[uhl@enpl-pt2-10g ~]$ 

 

 

 

$ pscheduler task --debug throughput --source enpl-pt2-10g.eos.nasa.gov --dest nettest.boulder.noaa.gov  --ip-version 4

2019-05-30T14:15:18 Debug started

2019-05-30T14:15:18 Assistance is from localhost

2019-05-30T14:15:18 Forcing default slip of PT5M

2019-05-30T14:15:18 Converting to spec via https://localhost/pscheduler/tests/throughput/spec

Submitting task...

2019-05-30T14:15:18 Fetching participant list

2019-05-30T14:15:18 Spec is: {"dest": "nettest.boulder.noaa.gov", "source": "enpl-pt2-10g.eos.nasa.gov", "ip-version": 4, "schema": 1}

2019-05-30T14:15:18 Params are: {'spec': '{"dest": "nettest.boulder.noaa.gov", "source": "enpl-pt2-10g.eos.nasa.gov", "ip-version": 4, "schema": 1}'}

2019-05-30T14:15:18 Got participants: {u'participants': [u'enpl-pt2-10g.eos.nasa.gov', u'nettest.boulder.noaa.gov']}

2019-05-30T14:15:18 Lead is enpl-pt2-10g.eos.nasa.gov

2019-05-30T14:15:18 Pinging https://enpl-pt2-10g.eos.nasa.gov/pscheduler/

2019-05-30T14:15:18 enpl-pt2-10g.eos.nasa.gov is up

2019-05-30T14:15:18 Posting task to https://enpl-pt2-10g.eos.nasa.gov/pscheduler/tasks

2019-05-30T14:15:18 Data is {"test": {"type": "throughput", "spec": {"dest": "nettest.boulder.noaa.gov", "source": "enpl-pt2-10g.eos.nasa.gov", "ip-version": 4, "schema": 1}}, "schema": 1, "schedule": {"slip": "PT5M"}}

Task URL:

https://enpl-pt2-10g.eos.nasa.gov/pscheduler/tasks/2207091b-b376-4c16-8e29-6d5c9a3e36c7

2019-05-30T14:15:35 Posted https://enpl-pt2-10g.eos.nasa.gov/pscheduler/tasks/2207091b-b376-4c16-8e29-6d5c9a3e36c7

2019-05-30T14:15:35 Submission diagnostics:

2019-05-30T14:15:35   Hints:

2019-05-30T14:15:35     requester: 169.154.197.28

2019-05-30T14:15:35     server: 169.154.197.28

2019-05-30T14:15:35   Identified as everybody, local-interfaces

2019-05-30T14:15:35   Classified as default, friendlies

2019-05-30T14:15:35   Application: Hosts we trust to do everything

2019-05-30T14:15:35     Group 1: Limit 'always' passed

2019-05-30T14:15:35     Group 1: Want all, 1/1 passed, 0/1 failed: PASS

2019-05-30T14:15:35     Application PASSES

2019-05-30T14:15:35   Application: Defaults applied to non-friendly hosts

2019-05-30T14:15:35     Group 1: Limit 'innocuous-tests' failed: Passed but inverted

2019-05-30T14:15:35     Group 1: Limit 'throughput-default-time' passed

2019-05-30T14:15:35     Group 1: Limit 'idleex-default' failed: Test is not 'idleex'

2019-05-30T14:15:35     Group 1: Want any, 1/3 passed, 2/3 failed: PASS

2019-05-30T14:15:35     Application PASSES

2019-05-30T14:15:35   Proposal meets limits

Running with tool 'iperf3'

Fetching first run...

2019-05-30T14:15:35 Fetching https://enpl-pt2-10g.eos.nasa.gov/pscheduler/tasks/2207091b-b376-4c16-8e29-6d5c9a3e36c7/runs/first

2019-05-30T14:15:36 Handing off: pscheduler watch --first --format text/plain --debug https://enpl-pt2-10g.eos.nasa.gov/pscheduler/tasks/2207091b-b376-4c16-8e29-6d5c9a3e36c7

2019-05-30T14:15:36 Debug started

2019-05-30T14:15:36 Fetching https://enpl-pt2-10g.eos.nasa.gov/pscheduler/tasks/2207091b-b376-4c16-8e29-6d5c9a3e36c7

2019-05-30T14:15:36 Fetching next run from https://enpl-pt2-10g.eos.nasa.gov/pscheduler/tasks/2207091b-b376-4c16-8e29-6d5c9a3e36c7/runs/first

 

Next scheduled run:

https://enpl-pt2-10g.eos.nasa.gov/pscheduler/tasks/2207091b-b376-4c16-8e29-6d5c9a3e36c7/runs/745d9145-65f3-4b4d-804f-2bf1fe1b12b3

Starts 2019-05-30T10:16:05-04:00 (~28 seconds)

Ends   2019-05-30T10:16:24-04:00 (~18 seconds)

Waiting for result...

 

Run did not complete: Failed

 

 

Diagnostics from enpl-pt2-10g.eos.nasa.gov:

  No diagnostics.

 

Error from enpl-pt2-10g.eos.nasa.gov:

  iperf3 returned an error: error - unable to connect to server: Connection refused

 

Diagnostics from nettest.boulder.noaa.gov:

  No diagnostics.

 

Error from nettest.boulder.noaa.gov:

  iperf3 returned an error: 

  Process took too long to run.

2019-05-30T14:16:24 Fetching next run from https://enpl-pt2-10g.eos.nasa.gov/pscheduler/tasks/2207091b-b376-4c16-8e29-6d5c9a3e36c7/runs/next

 

No further runs scheduled.

 

 

 

From: <> on behalf of Mark Feit <>
Reply-To: Mark Feit <>
Date: Wednesday, May 29, 2019 at 10:59 AM
To: Alex Hsia <>
Cc: "" <>
Subject: Re: [perfsonar-user] Problems with Debian pscheduler

 

Alex Hsia writes:

 

There shouldn't be any firewall blocking access to simplestream and the host is using the default perfSONAR default host level firewall rules, i.e.:

 

I did some additional poking around and found the cause:  the hosts involved have mixed IP stacks.

 

Nettest is single-stack, so its FQDN has an A record only; sdmz-perfsonar-40g is dual-stack and has both.  Because most flavors of Linux will prefer IPv6 if available, sdmz’s FQDN will get an AAAA record first and the listening socket opened will be IPv6.  Nettest won’t be able to see that.  You can force your way around this by adding the “--ip-version 4” switch when running the troubleshooter or specifying IPv4 addresses explicitly.

 

The simplestream test and simplestreamer tool were originally written for use during development, but it’s become clear the diagnostics it produces aren’t as helpful as they could be to end users.  While I’m at it, I’ll probably make some adjustments to the troubleshooter to spot this situation and either warn about it or take steps to avoid it.  I’ve opened a couple tickets on these issues and should have enhancements out with 4.2.0:  https://github.com/perfsonar/pscheduler/issues/850 and https://github.com/perfsonar/pscheduler/issues/851.

 

--Mark

 




Archive powered by MHonArc 2.6.19.

Top of Page