Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] Debugging pscheduler

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] Debugging pscheduler


Chronological Thread 
  • From: Charley Kneifel <>
  • To: Zhi-Wei Lu <>, "" <>
  • Subject: Re: [perfsonar-user] Debugging pscheduler
  • Date: Wed, 26 Apr 2017 21:20:04 +0000
  • Accept-language: en-US
  • Authentication-results: oit.duke.edu; spf=none
  • Authentication-results: ucdavis.edu; dkim=none (message not signed) header.d=none;ucdavis.edu; dmarc=none action=none header.from=duke.edu;
  • Ironport-phdr: 9a23:3ENG5B3FYEMbMeoSsmDT+DRfVm0co7zxezQtwd8ZseMQK/ad9pjvdHbS+e9qxAeQG96KtbQY16GP7fiocFdDyK7JiGoFfp1IWk1NouQttCtkPvS4D1bmJuXhdS0wEZcKflZk+3amLRodQ56mNBXdrXKo8DEdBAj0OxZrKeTpAI7SiNm82/yv95HJbQhFgDuwbal9IRi5ogndq8gbjZd/Iast1xXFpWdFdf5Lzm1yP1KTmBj85sa0/JF99ilbpuws+c1dX6jkZqo0VbNXAigoPGAz/83rqALMTRCT6XsGU2UZiQRHDg7Y5xznRJjxsy/6tu1g2CmGOMD9UL45VSi+46ptVRTljjoMOTwk/2HNksF+jLxVrg+9pxJxwIDUboOaNPticazSZt4VX3ZNUtpLWiBdHo+xb40CBPcBM+ZCqIn9okMDoAW+BQa2HuPvziFHh3/r1qM/1eQuDwDG3AI+ENIKrX/Zq8n6NKcIXuCvyKnH0zXCY+lV2Tf79ofIcx4hru+IXb9rbMXR1EovGB3fglqOtIPlIiqY2+IQuGaV6OpgUPigi28hqwxpvzig2tojhZPXhoIU0VDI7zt2z5soJdGgVUJ3f92pHIFNuy2EK4d6WMAvT31ytCs4ybALv4OwcisSyJk/2hLSa+KLf5KG7x75SeqcIjN1iGh7dL+whxu+6Vasx+3mWsWp0VtHrjBJnsfSun0CzRDe5dCLRuVh8kqjwzqC2AHe5vtZLU01m6fXMYMtzqMumpcVrE/NBDX5mF/sg6+Tbkgk+van6+DgYrj+o5+TLY50igXnPqQhlM2zGPk0PwwUU2ic+OS8yKfv8lDkQLVJkPI6iLfWv43HJcgDp665BRFa0po75hqiDzqqzs4UkHcdIF5Yex+Ki5LlN0/BLf34Ffu/hk6jkDZvx/DIJL3hBZDNI2Dfn7fhZ7l98UpcxxQzzNxF5JJUDK8OIPP9WkDvsNzVFQI2MxG3w+n5EtlyyJ4RWX+XDq+DLKzSqUOI5v4oI+SUfI8apiz9K/Y+5/7pi382g0UdcbC03ZsMdn+4BO9rI0GYYXr3ntcBCnkGshA/TOzslF2NTyRTZ3CsUKIg+D03EpypApreRtPlvLvU7ie6W6JbZyhiFxjYA2jsaq2ZUPsFdiOJZMJtj2pAHfK5RpUvzhaovRW/1qFqNMLV/DEVr5TuyIIz6uHO31lm7TFuAd+a1WiXCnxvk3kgRjkq0bp5rFAnjFqPzP4rreZfEIl45/pTVgorfb3awu10BtS6DgnBd82NRUfgQdOrAD88Svo93twDJUtxBoPx3Vj4wyO2DupNxPSwD5su//eE0g==
  • Spamdiagnosticmetadata: NSPM
  • Spamdiagnosticoutput: 1:99

I have had a similar problem - pscheduler will run properly after running the reinstall but then will stop working after a reboot.


This is from a vanilla ISO install onto a VM from the 4/21/17 release


after logging in I see the following


ABRT has detected 1 problem(s). For more info run: abrt-cli list --since 1493040255
[root@ps-node-e charley]# abrt-cli list --since 1493040255
id 42223fad7206adaf563a3c79556b12b8bb0b0b44
reason:         pool.py:152:__init__:ValueError: Number of processes must be at least 1
time:           Mon 24 Apr 2017 06:51:42 AM EDT
cmdline:        /usr/bin/python /usr/libexec/pscheduler/commands/../classes/tool/traceroute/run
package:        pscheduler-tool-traceroute-1.0-1.el7.centos
uid:            1000 (pscheduler)
count:          2
Directory:      /var/spool/abrt/Python-2017-04-24-06:51:42-9460
Reported:       https://retrace.fedoraproject.org/faf/reports/bthash/cf21d9b6825a907dfc058e2537978065e77bf78e

id 0423ef98e262f14f154da722888811992eb36fee
reason:         __init__.py:164:connect:OperationalError: could not connect to server: Connection refused
time:           Sun 23 Apr 2017 01:17:14 AM EDT
cmdline:        /usr/bin/python /usr/libexec/pscheduler/daemons/archiver --dsn @/etc/pscheduler/database/database-dsn
package:        pscheduler-server-1.0.0.1-1.el7.centos
uid:            1000 (pscheduler)
count:          3
Directory:      /var/spool/abrt/Python-2017-04-23-01:17:14-729
[root@ps-node-e charley]#

and the following types of errors in pscheduler.log after a reboot


Apr 26 17:18:17 ps-node-e journal: archiver INFO     Started
Apr 26 17:18:17 ps-node-e journal: scheduler INFO     Started
Apr 26 17:18:17 ps-node-e journal: runner INFO     Started
Apr 26 17:18:17 ps-node-e journal: safe_run/runner ERROR    Program threw an exception after 0:00:00.000110
Apr 26 17:18:17 ps-node-e journal: safe_run/scheduler ERROR    Program threw an exception after 0:00:00.001355
Apr 26 17:18:17 ps-node-e journal: safe_run/ticker ERROR    Program threw an exception after 0:00:00.003222
Apr 26 17:18:17 ps-node-e journal: safe_run/ticker ERROR    Exception: OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.1" and accepting#012#011TCP/IP connections on port 5432?#012#012Traceback (most recent call last):#012  File "/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in safe_run#012    function()#012  File "/usr/libexec/pscheduler/daemons/ticker", line 156, in <lambda>#012    pscheduler.safe_run(lambda: main_program())#012  File "/usr/libexec/pscheduler/daemons/ticker", line 123, in main_program#012    db = pscheduler.pg_connection(dsn)#012  File "/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in pg_connection#012    pg = psycopg2.connect(dsn)#012  File "/usr/lib64/python2.7/site-packages/psycopg2/__init__.py", line 164, in connect#012    conn = _connect(dsn, connection_factory=connection_factory, async=async)#012OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.1" and accepting#012#011TCP/IP connections on port 5432?
Apr 26 17:18:17 ps-node-e journal: safe_run/ticker ERROR    Waiting 0.25 seconds before restarting
Apr 26 17:18:17 ps-node-e journal: safe_run/scheduler ERROR    Exception: OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.1" and accepting#012#011TCP/IP connections on port 5432?#012#012Traceback (most recent call last):#012  File "/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in safe_run#012    function()#012  File "/usr/libexec/pscheduler/daemons/scheduler", line 660, in <lambda>#012    pscheduler.safe_run(lambda: main_program())#012  File "/usr/libexec/pscheduler/daemons/scheduler", line 527, in main_program#012    pg = pscheduler.pg_connection(dsn)#012  File "/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in pg_connection#012    pg = psycopg2.connect(dsn)#012  File "/usr/lib64/python2.7/site-packages/psycopg2/__init__.py", line 164, in connect#012    conn = _connect(dsn, connection_factory=connection_factory, async=async)#012OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.1" and accepting#012#011TCP/IP connections on port 5432?
Apr 26 17:18:17 ps-node-e journal: safe_run/scheduler ERROR    Waiting 0.25 seconds before restarting
Apr 26 17:18:17 ps-node-e journal: safe_run/runner ERROR    Exception: OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.1" and accepting#012#011TCP/IP connections on port 5432?#012#012Traceback (most recent call last):#012  File "/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in safe_run#012    function()#012  File "/usr/libexec/pscheduler/daemons/runner", line 909, in <lambda>#012    pscheduler.safe_run(lambda: main_program())#012  File "/usr/libexec/pscheduler/daemons/runner", line 719, in main_program#012    db = pscheduler.pg_connection(dsn)#012  File "/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in pg_connection#012    pg = psycopg2.connect(dsn)#012  File "/usr/lib64/python2.7/site-packages/psycopg2/__init__.py", line 164, in connect#012    conn = _connect(dsn, connection_factory=connection_factory, async=async)#012OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.1" and accepting#012#011TCP/IP connections on port 5432?





































































From: <> on behalf of Zhi-Wei Lu <>
Sent: Wednesday, April 26, 2017 4:58 PM
To:
Subject: [perfsonar-user] Debugging pscheduler
 

Hi all,

 

After perfsonar 4.0 upgrade, I struggled to get one of our perfsonar boxes working again (with help from Andrew Lake).  It worked for a short while, and I noticed that it appears that pscheduler is not working properly.

 

pscheduler task rtt --dest p0.noc.ucdavis.edu --source 128.120.80.78

Submitting task...

Task URL:

https://128.120.80.78/pscheduler/tasks/8f17174e-2ca7-47d8-b98e-dd96b1d01e2b

Running with tool 'ping'

Fetching first run...

128.120.80.78 never scheduled a run for the task.

 

Reverse the source and dest, test worked

 

pscheduler task rtt --source p0.noc.ucdavis.edu --dest 128.120.80.78

Submitting task...

Task URL:

https://p0.noc.ucdavis.edu/pscheduler/tasks/b569b99a-bed7-431c-8948-b34238cdcacf

Running with tool 'ping'

Fetching first run...

 

Next scheduled run:

https://p0.noc.ucdavis.edu/pscheduler/tasks/b569b99a-bed7-431c-8948-b34238cdcacf/runs/d13ba8e6-6fcd-4aa6-8873-b411e9fb99a7

Starts 2017-04-26T13:43:28-07:00 (~7 seconds)

Ends   2017-04-26T13:43:39-07:00 (~10 seconds)

Waiting for result...

 

1       melange-owamp-v4.noc.ucdavis.edu (128.120.80.78)  64 Bytes  TTL 60  RTT   0.3550 ms

2       melange-owamp-v4.noc.ucdavis.edu (128.120.80.78)  64 Bytes  TTL 60  RTT   0.3540 ms

3       melange-owamp-v4.noc.ucdavis.edu (128.120.80.78)  64 Bytes  TTL 60  RTT   0.3680 ms

4       melange-owamp-v4.noc.ucdavis.edu (128.120.80.78)  64 Bytes  TTL 60  RTT   0.3570 ms

5       melange-owamp-v4.noc.ucdavis.edu (128.120.80.78)  64 Bytes  TTL 60  RTT   0.3420 ms

 

0% Packet Loss  RTT Min/Mean/Max/StdDev = 0.342000/0.355000/0.368000/0.014000 ms

 

No further runs scheduled.

 

Bwping works for both directions, though.

 

The pscheduer-scheduler daemon is “running”.  I wonder how I can trace down the real problem behind this issue.

 

Thank you.

 

Zhi-Wei Lu

IET-CR-Network Operations Center

University of California, Davis

(530) 752-0155

 




Archive powered by MHonArc 2.6.19.

Top of Page