perfsonar-user - Re: [perfsonar-user] Debugging pscheduler
Subject: perfSONAR User Q&A and Other Discussion
List archive
- From: Kathy Benninger <>
- To:
- Subject: Re: [perfsonar-user] Debugging pscheduler
- Date: Fri, 5 May 2017 12:15:39 -0400
- Ironport-phdr: 9a23:3TT6AxybiTtwvbbXCy+O+j09IxM/srCxBDY+r6Qd0uoXKfad9pjvdHbS+e9qxAeQG96KtbQa16GH4ujJYi8p2d65qncMcZhBBVcuqP49uEgeOvODElDxN/XwbiY3T4xoXV5h+GynYwAOQJ6tL1LdrWev4jEMBx7xKRR6JvjvGo7Vks+7y/2+94fdbghMhTexe7J/IRq5oQjVssQdnJdvJLs2xhbVuHVDZv5YxXlvJVKdnhb84tm/8Zt++ClOuPwv6tBNX7zic6s3UbJXAjImM3so5MLwrhnMURGP5noHXWoIlBdDHhXI4wv7Xpf1tSv6q/Z91SyHNsD4Ubw4RTKv5LptRRT1iikIKiQ5/XnKhMJugqJVoBGvqRJxzIHbYo6aKOFzcbnBcd4AX2dNQshcWi5HD4ihb4UPFe0BPeNAoof4vVQOsRu+BAmxD+7y1DBIgHn23aw80+QuDw7GxhErEtUVv3vKqtX1MqYSUea6zKbW1zXOdOtW2Szh54TSbB8uvOyMUKt2fMHMx0cvEAbFgU+RqYzjJz6Vy/gCvHWB4Op+VOKvkHQrpB9srTiy3ssskpLJiZ4Pxl/a6Cp53Z45JcWjSE97fdGkEJpRuzucN4RoXsMuW2BouCAmyrIYo567ejYFyIg5yxLFdfOIbpWI7grlVOeRPDd0nmxqd6+ihxqq8UmgzfD8VtOu3FZNtCpFncHAum0T2xHQ8MSLV+Vx8lu71TuMywzf8ONJLEMsmarVNZEu37kwloAJvkTbBC/2n0f2g7GKeUU44OSo7P7nYrr+qp+fMY97lAD+Mqowlcy7G+g4Ng8OUHSB9uS4zLHj+FP2QKlQjv0xjqbWqovaJdgBqq6/HQBVzoAu4Au8ATe+yNkUgGcLIVFfdB6ajIXlJUvCLfD7APulnligjCtny+jbMrH/AZjBNGXPnbT/cbpn9kJRyRY/wcpC655JDrwMJu/4VFXru9zCFBA5NhS5w+b5B9V50YMTQWePDbWYMKPWr1CI/P8jL/OUZI8OpDnxMeYq6OPzjXMhmF8de7em3YcPZXylAPhrIF+VbWfvj9sfC2sHvgkzQPb3hFGeTTJff3OyULg95jE/BoKmF4DDRoW1jbOawii7GJtWZmFAClCDD3jobZ6JW+8XaC2OOMNujCELVaW5R487yR6urBP6y6ZgLufM4i0Xq4jj1NZu5+3UkxE97yZ0A92A02GWUW50hHgFRzs33KBkvUx90UmP3bJ5g/xeCdxc+elJUgEkOp7A0eB2Ec79VR/cfoTBdFHzWti8Dyo2SNsrhsIVblxVGtO+gwrF0jbwRbIZiu+lHpsxp4HVw3jgb+xwzXDH0qYslRFySMZJM2ygiaNl3xDIDMjEn1jPxPXiTrgVwCOYrDTL9mGJpkwNCAM=
I built a server with a perfSONAR Toolkit v4.0.0.1 .iso from
17-Apr-17. Unfortunately, as others have reported, I can't seem to
get pscheduler/testing to restart on its own after a reboot. As Charley Kneifel reported, yum -y reinstall pscheduler-server gets pscheduler working. Unfortunately, pscheduler still doesn't restart automatically after a reboot. Alternatively, [root@ps-test benninge]# systemctl start pscheduler-archiver gets things working without a reinstall (I don't know if it would have worked BEFORE the first time I did a "yum reinstall") I've noticed after reboot that pscheduler-archiver.service doesn't start: [root@ps-test benninge]# systemctl status pscheduler-archiver -l ● pscheduler-archiver.service - pScheduler server - archiver Loaded: loaded (/usr/lib/systemd/system/pscheduler-archiver.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Thu 2017-05-04 14:45:32 EDT; 9min ago Main PID: 1079 (code=exited, status=1/FAILURE) May 04 14:45:32 ps-test.psc.edu archiver[1079]: File "/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in pg_connection May 04 14:45:32 ps-test.psc.edu archiver[1079]: pg = psycopg2.connect(dsn) May 04 14:45:32 ps-test.psc.edu archiver[1079]: File "/usr/lib64/python2.7/site-packages/psycopg2/__init__.py", line 164, in connect May 04 14:45:32 ps-test.psc.edu archiver[1079]: conn = _connect(dsn, connection_factory=connection_factory, async=async) May 04 14:45:32 ps-test.psc.edu archiver[1079]: psycopg2.OperationalError: could not connect to server: Connection refused May 04 14:45:32 ps-test.psc.edu archiver[1079]: Is the server running on host "127.0.0.1" and accepting May 04 14:45:32 ps-test.psc.edu archiver[1079]: TCP/IP connections on port 5432? May 04 14:45:32 ps-test.psc.edu systemd[1]: pscheduler-archiver.service: main process exited, code=exited, status=1/FAILURE May 04 14:45:32 ps-test.psc.edu systemd[1]: Unit pscheduler-archiver.service entered failed state. May 04 14:45:32 ps-test.psc.edu systemd[1]: pscheduler-archiver.service failed. and pscheduler-ticker.service doesn't look happy: [root@ps-test benninge]# systemctl status pscheduler-ticker -l ● pscheduler-ticker.service - pScheduler server - ticker Loaded: loaded (/usr/lib/systemd/system/pscheduler-ticker.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2017-05-04 14:45:26 EDT; 17min ago Main PID: 1076 (ticker) CGroup: /system.slice/pscheduler-ticker.service └─1076 /usr/bin/python /usr/libexec/pscheduler/daemons/ticker --dsn @/etc/pscheduler/database/database-dsn May 04 14:45:34 ps-test.psc.edu ticker[1076]: conn = pscheduler.pg_connection(dsn) May 04 14:45:34 ps-test.psc.edu ticker[1076]: File "/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in pg_connection May 04 14:45:34 ps-test.psc.edu ticker[1076]: pg = psycopg2.connect(dsn) May 04 14:45:34 ps-test.psc.edu ticker[1076]: File "/usr/lib64/python2.7/site-packages/psycopg2/__init__.py", line 164, in connect May 04 14:45:34 ps-test.psc.edu ticker[1076]: conn = _connect(dsn, connection_factory=connection_factory, async=async) May 04 14:45:34 ps-test.psc.edu ticker[1076]: OperationalError: FATAL: the database system is starting up May 04 14:45:34 ps-test.psc.edu ticker[1076]: safe_run/ticker ERROR Program threw an exception after 0:00:00.383300 May 04 14:45:34 ps-test.psc.edu ticker[1076]: safe_run/ticker ERROR Exception: OperationalError: FATAL: the database system is starting up Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in safe_run function() File "/usr/libexec/pscheduler/daemons/ticker", line 156, in <lambda> pscheduler.safe_run(lambda: main_program()) File "/usr/libexec/pscheduler/daemons/ticker", line 123, in main_program db = pscheduler.pg_connection(dsn) File "/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in pg_connection pg = psycopg2.connect(dsn) File "/usr/lib64/python2.7/site-packages/psycopg2/__init__.py", line 164, in connect conn = _connect(dsn, connection_factory=connection_factory, async=async) OperationalError: FATAL: the database system is starting up May 04 14:45:34 ps-test.psc.edu ticker[1076]: safe_run/ticker ERROR Waiting 1.75 seconds before restarting May 04 14:45:35 ps-test.psc.edu ticker[1076]: safe_run/ticker ERROR Restarting Here is a set of messages from /var/log/pscheduler/pscheduler.log (also similar to what Charley reported): May 5 11:37:02 ps-test journal: archiver INFO Started May 5 11:37:02 ps-test journal: scheduler INFO Started May 5 11:37:02 ps-test journal: runner INFO Started May 5 11:37:02 ps-test journal: safe_run/ticker ERROR Program threw an exception after 0:00:00.001378 May 5 11:37:02 ps-test journal: safe_run/scheduler ERROR Program threw an exception after 0:00:00.000339 May 5 11:37:02 ps-test journal: safe_run/runner ERROR Program threw an exception after 0:00:00.000290 May 5 11:37:02 ps-test journal: safe_run/ticker ERROR Exception: OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.1" and accepting#012#011TCP/IP connections on port 5432?#012#012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in safe_run#012 function()#012 File "/usr/libexec/pscheduler/daemons/ticker", line 156, in <lambda>#012 pscheduler.safe_run(lambda: main_program())#012 Fil e "/usr/libexec/pscheduler/daemons/ticker", line 123, in main_program#012 db = pscheduler.pg_connection(dsn)#012 File "/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in pg_connection#012 pg = psycopg2.connect(dsn)#012 File "/usr/lib64/pyt hon2.7/site-packages/psycopg2/__init__.py", line 164, in connect#012 conn = _connect(dsn, connection_factory=connection_factory, async=async)#012OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0. 1" and accepting#012#011TCP/IP connections on port 5432? May 5 11:37:02 ps-test journal: safe_run/ticker ERROR Waiting 0.25 seconds before restarting May 5 11:37:02 ps-test journal: safe_run/scheduler ERROR Exception: OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.1" and accepting#012#011TCP/IP connections on port 5432?#012#012Traceback (m ost recent call last):#012 File "/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in safe_run#012 function()#012 File "/usr/libexec/pscheduler/daemons/scheduler", line 660, in <lambda>#012 pscheduler.safe_run(lambda: main_program())#01 2 File "/usr/libexec/pscheduler/daemons/scheduler", line 527, in main_program#012 pg = pscheduler.pg_connection(dsn)#012 File "/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in pg_connection#012 pg = psycopg2.connect(dsn)#012 File "/usr/ lib64/python2.7/site-packages/psycopg2/__init__.py", line 164, in connect#012 conn = _connect(dsn, connection_factory=connection_factory, async=async)#012OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.1" and accepting#012#011TCP/IP connections on port 5432? May 5 11:37:02 ps-test journal: safe_run/scheduler ERROR Waiting 0.25 seconds before restarting May 5 11:37:02 ps-test journal: safe_run/runner ERROR Exception: OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.1" and accepting#012#011TCP/IP connections on port 5432?#012#012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in safe_run#012 function()#012 File "/usr/libexec/pscheduler/daemons/runner", line 909, in <lambda>#012 pscheduler.safe_run(lambda: main_program())#012 Fil e "/usr/libexec/pscheduler/daemons/runner", line 719, in main_program#012 db = pscheduler.pg_connection(dsn)#012 File "/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in pg_connection#012 pg = psycopg2.connect(dsn)#012 File "/usr/lib64/pyt hon2.7/site-packages/psycopg2/__init__.py", line 164, in connect#012 conn = _connect(dsn, connection_factory=connection_factory, async=async)#012OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0. 1" and accepting#012#011TCP/IP connections on port 5432? May 5 11:37:02 ps-test journal: safe_run/runner ERROR Waiting 0.25 seconds before restarting May 5 11:37:02 ps-test journal: safe_run/ticker ERROR Restarting May 5 11:37:02 ps-test journal: safe_run/scheduler ERROR Restarting May 5 11:37:02 ps-test journal: safe_run/runner ERROR Restarting May 5 11:37:02 ps-test journal: safe_run/scheduler ERROR Program threw an exception after 0:00:00.000342 May 5 11:37:02 ps-test journal: safe_run/ticker ERROR Program threw an exception after 0:00:00.001007 May 5 11:37:02 ps-test journal: safe_run/runner ERROR Program threw an exception after 0:00:00.000317 May 5 11:37:02 ps-test journal: safe_run/scheduler ERROR Exception: OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.1" and accepting#012#011TCP/IP connections on port 5432?#012#012Traceback (m ost recent call last):#012 File "/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in safe_run#012 function()#012 File "/usr/libexec/pscheduler/daemons/scheduler", line 660, in <lambda>#012 pscheduler.safe_run(lambda: main_program())#01 2 File "/usr/libexec/pscheduler/daemons/scheduler", line 527, in main_program#012 pg = pscheduler.pg_connection(dsn)#012 File "/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in pg_connection#012 pg = psycopg2.connect(dsn)#012 File "/usr/ lib64/python2.7/site-packages/psycopg2/__init__.py", line 164, in connect#012 conn = _connect(dsn, connection_factory=connection_factory, async=async)#012OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.1" and accepting#012#011TCP/IP connections on port 5432? May 5 11:37:02 ps-test journal: safe_run/scheduler ERROR Waiting 0.5 seconds before restarting May 5 11:37:02 ps-test journal: safe_run/runner ERROR Exception: OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.1" and accepting#012#011TCP/IP connections on port 5432?#012#012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in safe_run#012 function()#012 File "/usr/libexec/pscheduler/daemons/runner", line 909, in <lambda>#012 pscheduler.safe_run(lambda: main_program())#012 Fil e "/usr/libexec/pscheduler/daemons/runner", line 719, in main_program#012 db = pscheduler.pg_connection(dsn)#012 File "/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in pg_connection#012 pg = psycopg2.connect(dsn)#012 File "/usr/lib64/pyt hon2.7/site-packages/psycopg2/__init__.py", line 164, in connect#012 conn = _connect(dsn, connection_factory=connection_factory, async=async)#012OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0. 1" and accepting#012#011TCP/IP connections on port 5432? May 5 11:37:02 ps-test journal: safe_run/runner ERROR Waiting 0.5 seconds before restarting May 5 11:37:02 ps-test journal: safe_run/ticker ERROR Exception: OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.1" and accepting#012#011TCP/IP connections on port 5432?#012#012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in safe_run#012 function()#012 File "/usr/libexec/pscheduler/daemons/ticker", line 156, in <lambda>#012 pscheduler.safe_run(lambda: main_program())#012 Fil e "/usr/libexec/pscheduler/daemons/ticker", line 123, in main_program#012 db = pscheduler.pg_connection(dsn)#012 File "/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in pg_connection#012 pg = psycopg2.connect(dsn)#012 File "/usr/lib64/pyt hon2.7/site-packages/psycopg2/__init__.py", line 164, in connect#012 conn = _connect(dsn, connection_factory=connection_factory, async=async)#012OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0. 1" and accepting#012#011TCP/IP connections on port 5432? Thanks for any help! Kathy On 4/26/2017 5:50 PM, Charley Kneifel
wrote:
|
- Re: [perfsonar-user] Debugging pscheduler, Kathy Benninger, 05/05/2017
- RE: [perfsonar-user] Debugging pscheduler, Zhi-Wei Lu, 05/05/2017
- Re: [perfsonar-user] Debugging pscheduler, Kathy Benninger, 05/10/2017
Archive powered by MHonArc 2.6.19.