Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] Debugging pscheduler

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] Debugging pscheduler


Chronological Thread 
  • From: Kathy Benninger <>
  • To:
  • Subject: Re: [perfsonar-user] Debugging pscheduler
  • Date: Wed, 10 May 2017 10:18:11 -0400
  • Authentication-results: mailer2.psc.edu; dmarc=none (p=none dis=none) header.from=psc.edu
  • Authentication-results: mailer2.psc.edu; spf=pass
  • Dkim-filter: OpenDKIM Filter v2.11.0 mailer2.psc.edu v4AEIG24011817
  • Dmarc-filter: OpenDMARC Filter v1.3.2 mailer2.psc.edu v4AEIG24011817
  • Ironport-phdr: 9a23:2ThvKhDGJfZNiK79qJiWUyQJP3N1i/DPJgcQr6AfoPdwSP37psmwAkXT6L1XgUPTWs2DsrQf2rWQ6/urBjdIyK3CmUhKSIZLWR4BhJdetC0bK+nBN3fGKuX3ZTcxBsVIWQwt1Xi6NU9IBJS2PAWK8TW94jEIBxrwKxd+KPjrFY7OlcS30P2594HObwlSijewZbF/IA+qoQjfucUbg4VvIbstxxXUpXdFZ/5Yzn5yK1KJmBb86Maw/Jp9/ClVpvks6c1OX7jkcqohVbBXAygoPG4z5M3wqBnMVhCP6WcGUmUXiRVHHQ7I5wznU5jrsyv6su192DSGPcDzULs5Vyiu47ttRRT1jioMKjw3/3zNisFokaxVvhyhqRx8zYDabo6aO/hxcb/Sc94BWWpMXdxcWzBdDo6ybYYCCfcKM+ZCr4n6olsDtR6wChOwBOPvyj5Dm2L70rM80+QnEAHG2gMgEMwUsH/Jq9j6L7kdUeazzKnJyzXDYO1Z2S3m5YjQaRAuveuAUqxtfsrM0EQiER7OgFuXqYzgJTyV1+INvnCG7+V8SO2gkXAnqwBqojS12Mgjl5TJipoNylDf6Sp12pg6KsClSEJjed6kC5tRuD2aN4p3WM8iRX9nuDoixr0Hv567YDYFyI4hxxHBd/yLbZKE7Qz+W+aMOTt4gnJkeK+lixms7Eeg1/XwV8mz0FZXtCpJiN7MtmoC1xDL98SLVPhw8l281TqRywzf8OJJLVwumafZKJMt2rAwmYQOvkTEBC/5gl32jLWRdkUm5Oen9/jnbq/hpp+GOI94kgL+MqIwlcy+D+Q3LAwOU3Kf+eS7yLLs50n5TK1PjvIojqnZqpfaKtgBqa6iHw9ZyIAj5AijDzqgzd8WnmEHIUpYdB6ZjIXlIUzCLfPmAfulgVmgjC1ny+3aMrDgGpnBM2bPnbT7cbpg6ENQ1hA/wc1e6p5IEL0NPe//V0v3udfFFBE2KRa0zPziCNhl1oMRR2aPAqiBPaPXq1CI5+YvLPeWZIMLojr9M+Ml6OL0gXAnhFAdYLGl3ZoRaH+mAvRrO1iVbWTvj9sbDGsHvRQyQPbohVGeSzJfemq+UqMk6jE+Eo2mDIPDRo63gLyG2Se2BoZZaX5AClCKEHfocYqEVOkWaC+JOs9hiSALWaK/RIA8yBGirBX6xKZ/LurI5i0Ysoru1MBr6O3JmxEy7jt0D8KH3GGXVm10g3gISCEs3K1kpUx9y0yD0bRjg/BGD9BT5vVJUhskOp7Gyex1FczyVhzbctiXVVmmX4buPTZkVt8rzcQJZU9nXsi5gwrr3iy2DqUTmqDRQpE47/HyxX/0cuV013bXnIMsiV8nRMRCLyXyjKN58QnVCIPTu12CnOCneblKj32Fz3uK0Wfb5BIQawV3S6iQBX0=

The problem appears to have been resolved with the update to v4.0.0.2


On 5/5/2017 12:15 PM, Kathy Benninger wrote:
I built a server with a perfSONAR Toolkit v4.0.0.1 .iso from 17-Apr-17.
Unfortunately, as others have reported, I can't seem to get
pscheduler/testing to restart on its own after a reboot.

As Charley Kneifel reported,
yum -y reinstall pscheduler-server
gets pscheduler working. Unfortunately, pscheduler still doesn't restart
automatically after a reboot.

Alternatively,
[root@ps-test benninge]# systemctl start pscheduler-archiver
gets things working without a reinstall (I don't know if it would have
worked BEFORE the first time I did a "yum reinstall")

I've noticed after reboot that pscheduler-archiver.service doesn't start:

[root@ps-test benninge]# systemctl status pscheduler-archiver -l
● pscheduler-archiver.service - pScheduler server - archiver
Loaded: loaded (/usr/lib/systemd/system/pscheduler-archiver.service;
enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Thu 2017-05-04 14:45:32 EDT;
9min ago
Main PID: 1079 (code=exited, status=1/FAILURE)

May 04 14:45:32 ps-test.psc.edu archiver[1079]: File
"/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in pg_connection
May 04 14:45:32 ps-test.psc.edu archiver[1079]: pg = psycopg2.connect(dsn)
May 04 14:45:32 ps-test.psc.edu archiver[1079]: File
"/usr/lib64/python2.7/site-packages/psycopg2/__init__.py", line 164, in
connect
May 04 14:45:32 ps-test.psc.edu archiver[1079]: conn = _connect(dsn,
connection_factory=connection_factory, async=async)
May 04 14:45:32 ps-test.psc.edu archiver[1079]: psycopg2.OperationalError:
could not connect to server: Connection refused
May 04 14:45:32 ps-test.psc.edu archiver[1079]: Is the server running on
host "127.0.0.1" and accepting
May 04 14:45:32 ps-test.psc.edu archiver[1079]: TCP/IP connections on port
5432?
May 04 14:45:32 ps-test.psc.edu systemd[1]: pscheduler-archiver.service:
main process exited, code=exited, status=1/FAILURE
May 04 14:45:32 ps-test.psc.edu systemd[1]: Unit pscheduler-archiver.service
entered failed state.
May 04 14:45:32 ps-test.psc.edu systemd[1]: pscheduler-archiver.service
failed.

and pscheduler-ticker.service doesn't look happy:

[root@ps-test benninge]# systemctl status pscheduler-ticker -l
● pscheduler-ticker.service - pScheduler server - ticker
Loaded: loaded (/usr/lib/systemd/system/pscheduler-ticker.service;
enabled; vendor preset: disabled)
Active: active (running) since Thu 2017-05-04 14:45:26 EDT; 17min ago
Main PID: 1076 (ticker)
CGroup: /system.slice/pscheduler-ticker.service
└─1076 /usr/bin/python /usr/libexec/pscheduler/daemons/ticker
--dsn @/etc/pscheduler/database/database-dsn

May 04 14:45:34 ps-test.psc.edu ticker[1076]: conn =
pscheduler.pg_connection(dsn)
May 04 14:45:34 ps-test.psc.edu ticker[1076]: File
"/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in pg_connection
May 04 14:45:34 ps-test.psc.edu ticker[1076]: pg = psycopg2.connect(dsn)
May 04 14:45:34 ps-test.psc.edu ticker[1076]: File
"/usr/lib64/python2.7/site-packages/psycopg2/__init__.py", line 164, in
connect
May 04 14:45:34 ps-test.psc.edu ticker[1076]: conn = _connect(dsn,
connection_factory=connection_factory, async=async)
May 04 14:45:34 ps-test.psc.edu ticker[1076]: OperationalError: FATAL: the
database system is starting up
May 04 14:45:34 ps-test.psc.edu ticker[1076]: safe_run/ticker ERROR
Program threw an exception after 0:00:00.383300
May 04 14:45:34 ps-test.psc.edu ticker[1076]: safe_run/ticker ERROR
Exception: OperationalError: FATAL: the database system is starting up

Traceback (most recent call
last):
File
"/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in safe_run
function()
File
"/usr/libexec/pscheduler/daemons/ticker", line 156, in <lambda>
pscheduler.safe_run(lambda: main_program())
File
"/usr/libexec/pscheduler/daemons/ticker", line 123, in main_program
db =
pscheduler.pg_connection(dsn)
File
"/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in pg_connection
pg = psycopg2.connect(dsn)
File
"/usr/lib64/python2.7/site-packages/psycopg2/__init__.py", line 164, in
connect
conn = _connect(dsn,
connection_factory=connection_factory, async=async)
OperationalError: FATAL: the
database system is starting up
May 04 14:45:34 ps-test.psc.edu ticker[1076]: safe_run/ticker ERROR
Waiting 1.75 seconds before restarting
May 04 14:45:35 ps-test.psc.edu ticker[1076]: safe_run/ticker ERROR
Restarting

Here is a set of messages from /var/log/pscheduler/pscheduler.log (also
similar to what Charley reported):

May 5 11:37:02 ps-test journal: archiver INFO Started
May 5 11:37:02 ps-test journal: scheduler INFO Started
May 5 11:37:02 ps-test journal: runner INFO Started
May 5 11:37:02 ps-test journal: safe_run/ticker ERROR Program threw an
exception after 0:00:00.001378
May 5 11:37:02 ps-test journal: safe_run/scheduler ERROR Program threw
an exception after 0:00:00.000339
May 5 11:37:02 ps-test journal: safe_run/runner ERROR Program threw an
exception after 0:00:00.000290
May 5 11:37:02 ps-test journal: safe_run/ticker ERROR Exception:
OperationalError: could not connect to server: Connection refused#012#011Is
the server running on host "127.0.0.1" and accepting#012#011TCP/IP
connections on port 5432?#012#012Traceback (most
recent call last):#012 File
"/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in
safe_run#012 function()#012 File
"/usr/libexec/pscheduler/daemons/ticker", line 156, in <lambda>#012
pscheduler.safe_run(lambda: main_program())#012 Fil
e "/usr/libexec/pscheduler/daemons/ticker", line 123, in main_program#012
db = pscheduler.pg_connection(dsn)#012 File
"/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in
pg_connection#012 pg = psycopg2.connect(dsn)#012 File "/usr/lib64/pyt
hon2.7/site-packages/psycopg2/__init__.py", line 164, in connect#012 conn
= _connect(dsn, connection_factory=connection_factory,
async=async)#012OperationalError: could not connect to server: Connection
refused#012#011Is the server running on host "127.0.0.
1" and accepting#012#011TCP/IP connections on port 5432?
May 5 11:37:02 ps-test journal: safe_run/ticker ERROR Waiting 0.25
seconds before restarting
May 5 11:37:02 ps-test journal: safe_run/scheduler ERROR Exception:
OperationalError: could not connect to server: Connection refused#012#011Is
the server running on host "127.0.0.1" and accepting#012#011TCP/IP
connections on port 5432?#012#012Traceback (m
ost recent call last):#012 File
"/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in
safe_run#012 function()#012 File
"/usr/libexec/pscheduler/daemons/scheduler", line 660, in <lambda>#012
pscheduler.safe_run(lambda: main_program())#01
2 File "/usr/libexec/pscheduler/daemons/scheduler", line 527, in
main_program#012 pg = pscheduler.pg_connection(dsn)#012 File
"/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in
pg_connection#012 pg = psycopg2.connect(dsn)#012 File "/usr/
lib64/python2.7/site-packages/psycopg2/__init__.py", line 164, in
connect#012 conn = _connect(dsn, connection_factory=connection_factory,
async=async)#012OperationalError: could not connect to server: Connection
refused#012#011Is the server running on host
"127.0.0.1" and accepting#012#011TCP/IP connections on port 5432?
May 5 11:37:02 ps-test journal: safe_run/scheduler ERROR Waiting 0.25
seconds before restarting
May 5 11:37:02 ps-test journal: safe_run/runner ERROR Exception:
OperationalError: could not connect to server: Connection refused#012#011Is
the server running on host "127.0.0.1" and accepting#012#011TCP/IP
connections on port 5432?#012#012Traceback (most
recent call last):#012 File
"/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in
safe_run#012 function()#012 File
"/usr/libexec/pscheduler/daemons/runner", line 909, in <lambda>#012
pscheduler.safe_run(lambda: main_program())#012 Fil
e "/usr/libexec/pscheduler/daemons/runner", line 719, in main_program#012
db = pscheduler.pg_connection(dsn)#012 File
"/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in
pg_connection#012 pg = psycopg2.connect(dsn)#012 File "/usr/lib64/pyt
hon2.7/site-packages/psycopg2/__init__.py", line 164, in connect#012 conn
= _connect(dsn, connection_factory=connection_factory,
async=async)#012OperationalError: could not connect to server: Connection
refused#012#011Is the server running on host "127.0.0.
1" and accepting#012#011TCP/IP connections on port 5432?
May 5 11:37:02 ps-test journal: safe_run/runner ERROR Waiting 0.25
seconds before restarting
May 5 11:37:02 ps-test journal: safe_run/ticker ERROR Restarting
May 5 11:37:02 ps-test journal: safe_run/scheduler ERROR Restarting
May 5 11:37:02 ps-test journal: safe_run/runner ERROR Restarting
May 5 11:37:02 ps-test journal: safe_run/scheduler ERROR Program threw
an exception after 0:00:00.000342
May 5 11:37:02 ps-test journal: safe_run/ticker ERROR Program threw an
exception after 0:00:00.001007
May 5 11:37:02 ps-test journal: safe_run/runner ERROR Program threw an
exception after 0:00:00.000317
May 5 11:37:02 ps-test journal: safe_run/scheduler ERROR Exception:
OperationalError: could not connect to server: Connection refused#012#011Is
the server running on host "127.0.0.1" and accepting#012#011TCP/IP
connections on port 5432?#012#012Traceback (m
ost recent call last):#012 File
"/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in
safe_run#012 function()#012 File
"/usr/libexec/pscheduler/daemons/scheduler", line 660, in <lambda>#012
pscheduler.safe_run(lambda: main_program())#01
2 File "/usr/libexec/pscheduler/daemons/scheduler", line 527, in
main_program#012 pg = pscheduler.pg_connection(dsn)#012 File
"/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in
pg_connection#012 pg = psycopg2.connect(dsn)#012 File "/usr/
lib64/python2.7/site-packages/psycopg2/__init__.py", line 164, in
connect#012 conn = _connect(dsn, connection_factory=connection_factory,
async=async)#012OperationalError: could not connect to server: Connection
refused#012#011Is the server running on host
"127.0.0.1" and accepting#012#011TCP/IP connections on port 5432?
May 5 11:37:02 ps-test journal: safe_run/scheduler ERROR Waiting 0.5
seconds before restarting
May 5 11:37:02 ps-test journal: safe_run/runner ERROR Exception:
OperationalError: could not connect to server: Connection refused#012#011Is
the server running on host "127.0.0.1" and accepting#012#011TCP/IP
connections on port 5432?#012#012Traceback (most
recent call last):#012 File
"/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in
safe_run#012 function()#012 File
"/usr/libexec/pscheduler/daemons/runner", line 909, in <lambda>#012
pscheduler.safe_run(lambda: main_program())#012 Fil
e "/usr/libexec/pscheduler/daemons/runner", line 719, in main_program#012
db = pscheduler.pg_connection(dsn)#012 File
"/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in
pg_connection#012 pg = psycopg2.connect(dsn)#012 File "/usr/lib64/pyt
hon2.7/site-packages/psycopg2/__init__.py", line 164, in connect#012 conn
= _connect(dsn, connection_factory=connection_factory,
async=async)#012OperationalError: could not connect to server: Connection
refused#012#011Is the server running on host "127.0.0.
1" and accepting#012#011TCP/IP connections on port 5432?
May 5 11:37:02 ps-test journal: safe_run/runner ERROR Waiting 0.5
seconds before restarting
May 5 11:37:02 ps-test journal: safe_run/ticker ERROR Exception:
OperationalError: could not connect to server: Connection refused#012#011Is
the server running on host "127.0.0.1" and accepting#012#011TCP/IP
connections on port 5432?#012#012Traceback (most
recent call last):#012 File
"/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in
safe_run#012 function()#012 File
"/usr/libexec/pscheduler/daemons/ticker", line 156, in <lambda>#012
pscheduler.safe_run(lambda: main_program())#012 Fil
e "/usr/libexec/pscheduler/daemons/ticker", line 123, in main_program#012
db = pscheduler.pg_connection(dsn)#012 File
"/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in
pg_connection#012 pg = psycopg2.connect(dsn)#012 File "/usr/lib64/pyt
hon2.7/site-packages/psycopg2/__init__.py", line 164, in connect#012 conn
= _connect(dsn, connection_factory=connection_factory,
async=async)#012OperationalError: could not connect to server: Connection
refused#012#011Is the server running on host "127.0.0.
1" and accepting#012#011TCP/IP connections on port 5432?

Thanks for any help!

Kathy



On 4/26/2017 5:50 PM, Charley Kneifel wrote:

I should have added that rerunning the pscheduler install "fixes" the
problem (until the next reboot) and now pscheduler starts just fine -


[root@ps-node-e pscheduler]# yum reinstall pscheduler-server
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
* Internet2: software.internet2.edu
* base: repos-va.psychz.net
* epel: mirror.metrocast.net
* extras: mirror.es.its.nyu.edu
* updates: ftpmirror.your.org
Resolving Dependencies
--> Running transaction check
---> Package pscheduler-server.noarch 0:1.0.0.1-1.el7.centos will be
reinstalled
--> Finished Dependency Resolution

Dependencies Resolved

==============================================================================================================================
Package Arch
Version Repository Size
==============================================================================================================================
Reinstalling:
pscheduler-server noarch
1.0.0.1-1.el7.centos Internet2 82 k

Transaction Summary
==============================================================================================================================
Reinstall 1 Package

Total download size: 82 k
Installed size: 281 k
Is this ok [y/d/N]: ifconfig ens160 mtu 9000
Is this ok [y/d/N]: y
Downloading packages:
pscheduler-server-1.0.0.1-1.el7.centos.noarch.rpm
| 82 kB 00:00:01
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction


Installing :
pscheduler-server-1.0.0.1-1.el7.centos.noarch
1/1
Verifying :
pscheduler-server-1.0.0.1-1.el7.centos.noarch
1/1

Installed:
pscheduler-server.noarch 0:1.0.0.1-1.el7.centos

Complete!
Apr 26 17:40:01 ps-node-e journal: runner INFO Started
Apr 26 17:40:01 ps-node-e journal: archiver INFO Started
Apr 26 17:40:01 ps-node-e journal: scheduler INFO Started
Apr 26 17:40:02 ps-node-e journal: runner INFO 37521: Running
https://ps-node-e.kneifel.local/pscheduler/tasks/f84ff3ed-b6de-454a-b67d-bb501c85a6d8/runs/a6127b24-b16f-4fbc-aaca-af153459d617
Apr 26 17:40:02 ps-node-e journal: runner INFO 37521: With powstream:
latencybg --data-ports 8760-9960 --dest 192.168.202.88 --packet-padding 0
--flip --source ps-node-b --ip-version 4 --packet-interval 0.1 --duration
PT86400S --packet-count 600
Apr 26 17:40:02 ps-node-e journal: runner INFO 37505: Running
https://ps-node-e.kneifel.local/pscheduler/tasks/c5616796-171d-421c-9552-5bc9d27c835a/runs/ea6234dc-fc28-4e8c-94fa-2913362f0ce4
Apr 26 17:40:02 ps-node-e journal: runner INFO 37505: With powstream:
latencybg --data-ports 8760-9960 --dest 192.168.202.88 --packet-padding 0
--flip --source ps-node-c --ip-version 4 --packet-interval 0.1 --duration
PT86400S --packet-count 600
Apr 26 17:40:02 ps-node-e journal: runner INFO 37509: Running
https://ps-node-e.kneifel.local/pscheduler/tasks/d202293b-2a1a-4704-a28a-3842781a6084/runs/a6bb94f1-7f58-4085-8c50-efaa2cdf2f14
Apr 26 17:40:02 ps-node-e journal: runner INFO 37509: With powstream:
latencybg --data-ports 8760-9960 --dest ps-node-c --packet-padding 0
--source 192.168.202.88 --ip-version 4 --packet-interval 0.1 --duration
PT86400S --packet-count 600
Apr 26 17:40:02 ps-node-e journal: runner INFO 37525: Running
https://ps-node-e.kneifel.local/pscheduler/tasks/621549b6-50dd-479e-af4f-04a6ae954f67/runs/69839120-f9fc-4ab6-89e6-b376297f8cb9
Apr 26 17:40:02 ps-node-e journal: runner INFO 37525: With powstream:
latencybg --data-ports 8760-9960 --dest ps-node-b --packet-padding 0
--source 192.168.202.88 --ip-version 4 --packet-interval 0.1 --duration
PT86400S --packet-count 600
Apr 26 17:40:02 ps-node-e journal: runner INFO 37528: Running
https://ps-node-e.kneifel.local/pscheduler/tasks/d78bf04b-e585-4556-807f-447bac14e9a9/runs/56cf30d8-ed90-41ab-a693-99a4315826a5
Apr 26 17:40:02 ps-node-e journal: runner INFO 37528: With powstream:
latencybg --data-ports 8760-9960 --dest 192.168.202.88 --packet-padding 0
--flip --source ps-node-D --ip-version 4 --packet-interval 0.1 --duration
PT86400S --packet-count 600
Apr 26 17:40:02 ps-node-e journal: runner INFO 37517: Running
https://ps-node-e.kneifel.local/pscheduler/tasks/3233041f-2c43-45a4-a9ae-3b299d68d0e3/runs/ba8968da-7d8c-4190-a69a-6e35f1f5bfc3
Apr 26 17:40:02 ps-node-e journal: runner INFO 37517: With powstream:
latencybg --data-ports 8760-9960 --dest ps-node-a --packet-padding 0
--source 192.168.202.88 --ip-version 4 --packet-interval 0.1 --duration
PT86400S --packet-count 600
Apr 26 17:40:02 ps-node-e journal: runner INFO 37513: Running
https://ps-node-e.kneifel.local/pscheduler/tasks/2ad455ca-14a8-4f70-b6ab-0c6506c0f077/runs/eb3c47b5-849e-4882-8ef2-e0f85b7fde67
Apr 26 17:40:02 ps-node-e journal: runner INFO 37513: With powstream:
latencybg --data-ports 8760-9960 --dest 192.168.202.88 --packet-padding 0
--flip --source ps-node-a --ip-version 4 --packet-interval 0.1 --duration
PT86400S --packet-count 600
Apr 26 17:40:02 ps-node-e journal: runner INFO 37532: Running
https://ps-node-e.kneifel.local/pscheduler/tasks/a4469c36-5e9b-40be-ad1e-61f6beddca64/runs/a313f507-5b02-4fca-8ecf-ebf08b4ca639
Apr 26 17:40:02 ps-node-e journal: runner INFO 37532: With powstream:
latencybg --data-ports 8760-9960 --dest ps-node-D --packet-padding 0
--source 192.168.202.88 --ip-version 4 --packet-interval 0.1 --duration
PT86400S --packet-count 600
Apr 26 17:40:37 ps-node-e journal: runner INFO 40123: Running
https://ps-node-e.kneifel.local/pscheduler/tasks/e541a236-ef39-4e01-901d-22638cb2120b/runs/afa179ab-1426-4008-aeb3-806627d278ce
Apr 26 17:40:37 ps-node-e journal: runner INFO 40123: With ping: rtt
--count 10 --dest ps-node-a --interval PT1S --ip-version 4 --source
192.168.202.88 --length 1000 --ttl 255
Apr 26 17:40:42 ps-node-e journal: pscheduler-api INFO Started
Apr 26 17:40:43 ps-node-e journal: pscheduler-api INFO Limits loaded
from /etc/pscheduler/limits.conf



----------------------------------------------------------------------------
*From:*

<>
on behalf of Charley Kneifel
<>
*Sent:* Wednesday, April 26, 2017 5:20 PM
*To:* Zhi-Wei Lu;

*Subject:* Re: [perfsonar-user] Debugging pscheduler


I have had a similar problem - pscheduler will run properly after running
the reinstall but then will stop working after a reboot.


This is from a vanilla ISO install onto a VM from the 4/21/17 release


after logging in I see the following


ABRT has detected 1 problem(s). For more info run: abrt-cli list --since
1493040255
[root@ps-node-e charley]# abrt-cli list --since 1493040255
id 42223fad7206adaf563a3c79556b12b8bb0b0b44
reason: pool.py:152:__init__:ValueError: Number of processes must
be at least 1
time: Mon 24 Apr 2017 06:51:42 AM EDT
cmdline: /usr/bin/python
/usr/libexec/pscheduler/commands/../classes/tool/traceroute/run
package: pscheduler-tool-traceroute-1.0-1.el7.centos
uid: 1000 (pscheduler)
count: 2
Directory: /var/spool/abrt/Python-2017-04-24-06:51:42-9460
Reported:
https://retrace.fedoraproject.org/faf/reports/bthash/cf21d9b6825a907dfc058e2537978065e77bf78e

id 0423ef98e262f14f154da722888811992eb36fee
reason: __init__.py:164:connect:OperationalError: could not
connect to server: Connection refused
time: Sun 23 Apr 2017 01:17:14 AM EDT
cmdline: /usr/bin/python /usr/libexec/pscheduler/daemons/archiver
--dsn @/etc/pscheduler/database/database-dsn
package: pscheduler-server-1.0.0.1-1.el7.centos
uid: 1000 (pscheduler)
count: 3
Directory: /var/spool/abrt/Python-2017-04-23-01:17:14-729
[root@ps-node-e charley]#

and the following types of errors in pscheduler.log after a reboot


Apr 26 17:18:17 ps-node-e journal: archiver INFO Started
Apr 26 17:18:17 ps-node-e journal: scheduler INFO Started
Apr 26 17:18:17 ps-node-e journal: runner INFO Started
Apr 26 17:18:17 ps-node-e journal: safe_run/runner ERROR Program threw
an exception after 0:00:00.000110
Apr 26 17:18:17 ps-node-e journal: safe_run/scheduler ERROR Program
threw an exception after 0:00:00.001355
Apr 26 17:18:17 ps-node-e journal: safe_run/ticker ERROR Program threw
an exception after 0:00:00.003222
Apr 26 17:18:17 ps-node-e journal: safe_run/ticker ERROR Exception:
OperationalError: could not connect to server: Connection
refused#012#011Is the server running on host "127.0.0.1" and
accepting#012#011TCP/IP connections on port 5432?#012#012Traceback (most
recent call last):#012 File
"/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in
safe_run#012 function()#012 File
"/usr/libexec/pscheduler/daemons/ticker", line 156, in <lambda>#012
pscheduler.safe_run(lambda: main_program())#012 File
"/usr/libexec/pscheduler/daemons/ticker", line 123, in main_program#012
db = pscheduler.pg_connection(dsn)#012 File
"/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in
pg_connection#012 pg = psycopg2.connect(dsn)#012 File
"/usr/lib64/python2.7/site-packages/psycopg2/__init__.py", line 164, in
connect#012 conn = _connect(dsn, connection_factory=connection_factory,
async=async)#012OperationalError: could not connect to server: Connection
refused#012#011Is the server running on host "127.0.0.1" and
accepting#012#011TCP/IP connections on port 5432?
Apr 26 17:18:17 ps-node-e journal: safe_run/ticker ERROR Waiting 0.25
seconds before restarting
Apr 26 17:18:17 ps-node-e journal: safe_run/scheduler ERROR Exception:
OperationalError: could not connect to server: Connection
refused#012#011Is the server running on host "127.0.0.1" and
accepting#012#011TCP/IP connections on port 5432?#012#012Traceback (most
recent call last):#012 File
"/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in
safe_run#012 function()#012 File
"/usr/libexec/pscheduler/daemons/scheduler", line 660, in <lambda>#012
pscheduler.safe_run(lambda: main_program())#012 File
"/usr/libexec/pscheduler/daemons/scheduler", line 527, in
main_program#012 pg = pscheduler.pg_connection(dsn)#012 File
"/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in
pg_connection#012 pg = psycopg2.connect(dsn)#012 File
"/usr/lib64/python2.7/site-packages/psycopg2/__init__.py", line 164, in
connect#012 conn = _connect(dsn, connection_factory=connection_factory,
async=async)#012OperationalError: could not connect to server: Connection
refused#012#011Is the server running on host "127.0.0.1" and
accepting#012#011TCP/IP connections on port 5432?
Apr 26 17:18:17 ps-node-e journal: safe_run/scheduler ERROR Waiting
0.25 seconds before restarting
Apr 26 17:18:17 ps-node-e journal: safe_run/runner ERROR Exception:
OperationalError: could not connect to server: Connection
refused#012#011Is the server running on host "127.0.0.1" and
accepting#012#011TCP/IP connections on port 5432?#012#012Traceback (most
recent call last):#012 File
"/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in
safe_run#012 function()#012 File
"/usr/libexec/pscheduler/daemons/runner", line 909, in <lambda>#012
pscheduler.safe_run(lambda: main_program())#012 File
"/usr/libexec/pscheduler/daemons/runner", line 719, in main_program#012
db = pscheduler.pg_connection(dsn)#012 File
"/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in
pg_connection#012 pg = psycopg2.connect(dsn)#012 File
"/usr/lib64/python2.7/site-packages/psycopg2/__init__.py", line 164, in
connect#012 conn = _connect(dsn, connection_factory=connection_factory,
async=async)#012OperationalError: could not connect to server: Connection
refused#012#011Is the server running on host "127.0.0.1" and
accepting#012#011TCP/IP connections on port 5432?




































































----------------------------------------------------------------------------
*From:*

<>
on behalf of Zhi-Wei Lu
<>
*Sent:* Wednesday, April 26, 2017 4:58 PM
*To:*

*Subject:* [perfsonar-user] Debugging pscheduler


Hi all,



After perfsonar 4.0 upgrade, I struggled to get one of our perfsonar boxes
working again (with help from Andrew Lake). It worked for a short while,
and I noticed that it appears that pscheduler is not working properly.



pscheduler task rtt --dest p0.noc.ucdavis.edu --source 128.120.80.78

Submitting task...

Task URL:

https://128.120.80.78/pscheduler/tasks/8f17174e-2ca7-47d8-b98e-dd96b1d01e2b

Running with tool 'ping'

Fetching first run...

128.120.80.78 never scheduled a run for the task.



Reverse the source and dest, test worked



pscheduler task rtt --source p0.noc.ucdavis.edu --dest 128.120.80.78

Submitting task...

Task URL:

https://p0.noc.ucdavis.edu/pscheduler/tasks/b569b99a-bed7-431c-8948-b34238cdcacf

Running with tool 'ping'

Fetching first run...



Next scheduled run:

https://p0.noc.ucdavis.edu/pscheduler/tasks/b569b99a-bed7-431c-8948-b34238cdcacf/runs/d13ba8e6-6fcd-4aa6-8873-b411e9fb99a7

Starts 2017-04-26T13:43:28-07:00 (~7 seconds)

Ends 2017-04-26T13:43:39-07:00 (~10 seconds)

Waiting for result...



1 melange-owamp-v4.noc.ucdavis.edu (128.120.80.78) 64 Bytes TTL
60 RTT 0.3550 ms

2 melange-owamp-v4.noc.ucdavis.edu (128.120.80.78) 64 Bytes TTL
60 RTT 0.3540 ms

3 melange-owamp-v4.noc.ucdavis.edu (128.120.80.78) 64 Bytes TTL
60 RTT 0.3680 ms

4 melange-owamp-v4.noc.ucdavis.edu (128.120.80.78) 64 Bytes TTL
60 RTT 0.3570 ms

5 melange-owamp-v4.noc.ucdavis.edu (128.120.80.78) 64 Bytes TTL
60 RTT 0.3420 ms



0% Packet Loss RTT Min/Mean/Max/StdDev =
0.342000/0.355000/0.368000/0.014000 ms



No further runs scheduled.



Bwping works for both directions, though.



The pscheduer-scheduler daemon is “running”. I wonder how I can trace
down the real problem behind this issue.



Thank you.



Zhi-Wei Lu

IET-CR-Network Operations Center

University of California, Davis

(530) 752-0155







Archive powered by MHonArc 2.6.19.

Top of Page