Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] perfSONAR 4.2.0 now available

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] perfSONAR 4.2.0 now available


Chronological Thread 
  • From: Andrew Lake <>
  • To: "" <>, Phil Reese <>
  • Subject: Re: [perfsonar-user] perfSONAR 4.2.0 now available
  • Date: Thu, 22 Aug 2019 09:59:24 -0400

Hi Phil,


I see a lot of Inactivity timeouts and I suspect the issues you list are related to each other. When we see lots of timeouts the first thing that comes to mind is that the server is too busy...but given the connection refused errors and such I suspect the are other things at play. I don’t know your full setup but a few things come to mind:

1. Any firewalls here? I see you are using Ipv6 so do the v6 firewalls match the v4? Its possible firewalls are causing stuff to hang leading to the timeouts.

2. Is DNS behaving? I see a lot of IPs being thrown around, so there might not be a lot of lookups, but if you do a reverse lookup does of the addresses does it return back right away (even if it didn’t find anything)? I’ve definitely seen cases where DNS just hangs forever instead of responding, which could also cause this type of behavior. Like I said, shouldn't really matter the result of the DNS query, just that it gets some type of response, error or otherwise, back in a timely manner.

3. Are the clocks on the boxes relatively in sync? pScheduler is not that picky but if you start getting close to a minute or above off then you might run into issues. Not sure how that would lead to timeouts, only mention it because that can cause the iperf connection refused errors like I saw in the bottom of your output. 

Thanks,
Andy


On August 20, 2019 at 7:26:42 PM, Phil Reese () wrote:

Andy,

Great to have GA on PS 4.2!

-----
I got right to testing and built a mixed Centos and Ubuntu agent mesh
and centralmanagement server from the ground up using only the new RPMs.

Below are two issues I'm seeing.  From my limited experience, I can't
say for sure whether its an agent or esmond issue. ('pscheduler
troubleshoot' works to 'normal' completion on all agents)

Phil

----
I found two issues that I couldn't resolve:

1. None of my throughput tests were successful.  The log messages run as
follows, one from an IPv4 host and the second from a IPv6 host:
Centos:
IPV4:
2019/08/19 17:11:20 WARN pid=32499
prog=perfSONAR_PS::PSConfig::PScheduler::Agent::_run_end line=227 g
uid=F2CE562A-C2DD-11E9-9ECD-83BDC162EEA1 msg=Problem adding test
throughput(asus1.linux.hom->el1.linux
.hom), continuing with rest of config: Inactivity timeout
IPv6
2019/08/19 17:11:20 WARN pid=32499
prog=perfSONAR_PS::PSConfig::PScheduler::Agent::_run_end line=227 g
uid=F2CE562A-C2DD-11E9-9ECD-83BDC162EEA1 msg=Problem adding test
throughput(2601:646:8a00:5bfb:9e5c:8e
ff:fe21:b6c->2601:646:8a00:c2c3:9e5c:8eff:fe20:d5e6), continuing with
rest of config: INTERNAL SERVER
ERROR

Ubuntu:
IPv4:
2019/08/19 19:08:26 WARN pid=1654
prog=perfSONAR_PS::PSConfig::PScheduler::Agent::_run_end line=227
guid=eb028b6b-507d-4d82-a76c-edbb4ea923d9 msg=Problem adding test
throughput(tinker.linux.hom->tinkers.linux.hom), continuing with rest of
config: Inactivity timeout

IPv6:
2019/08/19 19:08:26 WARN pid=1654
prog=perfSONAR_PS::PSConfig::PScheduler::Agent::_run_end line=227
guid=eb028b6b-507d-4d82-a76c-edbb4ea923d9 msg=Problem adding test
throughput(2601:646:8a00:5bfb:3c48:b89:a447:a51e->2601:646:8a00:c2c3:9e5c:8eff:fe20:d5e6),
continuing with rest of config: INTERNAL SERVER ERROR

---
Some Ubuntu WARNs looked like this instead:
2019/08/20 15:13:15 WARN pid=1654
prog=perfSONAR_PS::PSConfig::PScheduler::Agent::_run_end line=227
guid=ab85b7b7-db3d-455f-a1d1-5abcb84f2576 msg=Problem deleting test
throughput/iperf3(2601:646:8a00:5bfb:3c48:b89:a447:a51e->2601:646:8a00:5bfb:e083:48c6:e230:b84d),
continuing with rest of config: No route to host  <<---- Looks
suspicious, see full manual run included at the very end of this email.

( I noted this problem before the GA, thinking it would be resolved in
the final upload.)

2. My Ubuntu hosts were unable to complete any RTT tests: (to other
Ubuntu host or to Centos hosts, manual running of the command doesn't
work either, though a direct ping does)

2019/08/19 18:10:33 WARN pid=1654
prog=perfSONAR_PS::PSConfig::PScheduler::Agent::_run_end line=227
guid=58b5b2be-136a-4b93-85ee-beb47d3ccefa msg=Problem adding test
rtt(tinker.linux.hom->tinkers.linux.hom), continuing with rest of
config: Inactivity timeout

2019/08/19 18:10:33 WARN pid=1654
prog=perfSONAR_PS::PSConfig::PScheduler::Agent::_run_end line=227
guid=58b5b2be-136a-4b93-85ee-beb47d3ccefa msg=Problem adding test
rtt(tinker.linux.hom->el0.linux.hom), continuing with rest of config:
Inactivity timeout

root@tinker:/home/preeser# pscheduler task rtt --dest el0
Submitting task...
Task URL:
https://localhost/pscheduler/tasks/56e43826-fa6a-4a63-bf94-78edf8b77431
Running with tool 'ping'
Fetching first run...

Next scheduled run:
https://localhost/pscheduler/tasks/56e43826-fa6a-4a63-bf94-78edf8b77431/runs/582a326b-1c08-4c6a-92c2-a6e5df755677
Starts 2019-08-20T23:03:47Z (~0 seconds)
Ends   2019-08-20T23:03:58Z (~10 seconds)
Waiting for result...

Run did not complete: Failed

Limit system diagnostics for this run:

  Hints:
    requester: ::1
    server: ::1
  Identified as everybody, local-interfaces
  Classified as default, friendlies
  Application: Hosts we trust to do everything
    Group 1: Limit 'always' passed
    Group 1: Want all, 1/1 passed, 0/1 failed: PASS
    Application PASSES
  Application: Defaults applied to non-friendly hosts
    Group 1: Limit 'innocuous-tests' passed
    Group 1: Limit 'throughput-default-time' failed: Test is not
'throughput'
    Group 1: Limit 'idleex-default' failed: Test is not 'idleex'
    Group 1: Want any, 1/3 passed, 2/3 failed: PASS
    Application PASSES
  Proposal meets limits
  Priority set at 5:
    Initial priority  (Set to 0)
    Friendly requester  (+5)


Diagnostics from localhost:
  ping -n -c 5 -i 1.0 -W 1.0 192.168.1.42

Error from localhost:
  ping: socket: Operation not permitted


No further runs scheduled.
root@tinker:/home/preese# ping -n -c 5 -i 1.0 -W 1.0 192.168.1.42
PING 192.168.1.42 (192.168.1.42) 56(84) bytes of data.
64 bytes from 192.168.1.42: icmp_seq=1 ttl=64 time=0.548 ms
64 bytes from 192.168.1.42: icmp_seq=2 ttl=64 time=0.499 ms
64 bytes from 192.168.1.42: icmp_seq=3 ttl=64 time=0.477 ms
^C

-------
Manual run of the last throughput test, with the added iperf3 reference:

root@tinker:/home/preese# pscheduler task throughput --dest
2601:646:8a00:5bfb:e083:48c6:e230:b84d
Submitting task...
Task URL:
https://localhost/pscheduler/tasks/011f4d37-2ad6-45d1-87b0-a918904f3431
Running with tool 'iperf3'
Fetching first run...

Next scheduled run:
https://localhost/pscheduler/tasks/011f4d37-2ad6-45d1-87b0-a918904f3431/runs/b151405d-4f87-430e-8603-d6cd81b0ff18
Starts 2019-08-20T23:13:47Z (~3 seconds)
Ends   2019-08-20T23:14:06Z (~18 seconds)
Waiting for result...

Run did not complete: Failed

Limit system diagnostics for this run:

  Hints:
    requester: ::1
    server: ::1
  Identified as everybody, local-interfaces
  Classified as default, friendlies
  Application: Hosts we trust to do everything
    Group 1: Limit 'always' passed
    Group 1: Want all, 1/1 passed, 0/1 failed: PASS
    Application PASSES
  Application: Defaults applied to non-friendly hosts
    Group 1: Limit 'innocuous-tests' failed: Passed but inverted
    Group 1: Limit 'throughput-default-time' passed
    Group 1: Limit 'idleex-default' failed: Test is not 'idleex'
    Group 1: Want any, 1/3 passed, 2/3 failed: PASS
    Application PASSES
  Proposal meets limits
  Priority set at 5:
    Initial priority  (Set to 0)
    Friendly requester  (+5)


Diagnostics from localhost:
  /usr/bin/iperf3 -p 5201 -c 2601:646:8a00:5bfb:e083:48c6:e230:b84d -t
10 --json

Error from localhost:
  iperf3 returned an error: error - unable to connect to server:
Connection refused

Diagnostics from 2601:646:8a00:5bfb:e083:48c6:e230:b84d:
  Run was preempted.

Error from 2601:646:8a00:5bfb:e083:48c6:e230:b84d:
  No error.

No further runs scheduled.

root@tinker:/home/preese# /usr/bin/iperf3 -p 5201 -c
2601:646:8a00:5bfb:e083:48c6:e230:b84d -t 10
iperf3: error - unable to connect to server: Connection refused

root@tinker:/home/preese# ping6 2601:646:8a00:5bfb:e083:48c6:e230:b84d
PING
2601:646:8a00:5bfb:e083:48c6:e230:b84d(2601:646:8a00:5bfb:e083:48c6:e230:b84d)
56 data bytes
64 bytes from 2601:646:8a00:5bfb:e083:48c6:e230:b84d: icmp_seq=1 ttl=64
time=0.933 ms
64 bytes from 2601:646:8a00:5bfb:e083:48c6:e230:b84d: icmp_seq=2 ttl=64
time=0.819 ms





Archive powered by MHonArc 2.6.19.

Top of Page