Skip to Content.
Sympa Menu

perfsonar-user - RE: [perfsonar-user] Debugging pscheduler

Subject: perfSONAR User Q&A and Other Discussion

List archive

RE: [perfsonar-user] Debugging pscheduler


Chronological Thread 
  • From: Zhi-Wei Lu <>
  • To: Kathy Benninger <>, "" <>
  • Subject: RE: [perfsonar-user] Debugging pscheduler
  • Date: Fri, 5 May 2017 19:54:44 +0000
  • Accept-language: en-US
  • Authentication-results: psc.edu; dkim=none (message not signed) header.d=none;psc.edu; dmarc=none action=none header.from=ucdavis.edu;
  • Ironport-phdr: 9a23:n1o+FRVQkBg7hTcjeb8Kwxxmk0XV8LGtZVwlr6E/grcLSJyIuqrYZRWDvKdThVPEFb/W9+hDw7KP9fuxBypevd3Y7CpKWacPfidNsd8RkQ0kDZzNImzAB9muURYHGt9fXkRu5XCxPBsdMs//Y1rPvi/6tmZKSV3wOgVvO+v6BJPZgdip2OCu4Z3TZBhDiCagbb9oIhi7qQTcutMXjIZtJao91xTErmVVcOlK2G1kIk6ekQzh7cmq5p5j9CpQu/Ml98FeVKjxYro1Q79FAjk4Km45/MLkuwXNQguJ/XscT34ZkgFUDAjf7RH1RYn+vy3nvedgwiaaPMn2TbcpWTS+6qpgVRHlhDsbOzM/7WrakdJ7gr5Frx29phx/24/Ub5+TNPpiZaPWYNcWSXNcUspNSyBNB4WxZJYNAeUcJ+ZVt4nzqUUToxuiCweiB+3vxT1JhnDq0qI3yP4uHR3a0AE6A94CrHbZodPoP6kSS+C1y6zIwC3AYPNW2Dfx8YjHfQ4gofqRW7xwbNfaxE4uFgLLilWQtJfoPy6J2esQqWeb8etgVfm1h2E5tg5xvyGiy8ExgYfHgYIVz0rL9SR/wIstON23VFN0bsS6EJtRrCGWLZV5Td4/Q21woCY60aEJtYSlcycX1ZQr3wbfa+eHcoeU4hLiVfydITFmi3J5YL6/hhCy8VS+xe3mUMm7zkpKojRfntbSq38NygTf6s6HS/tg5EuuxyuA1x3L5uBEJkA0iLDbJ4Qmwr4qmZofqVnPEjH1lUnskaOaaFko9vKr5uj7eLnrpJyRO5d6ig7gL6sunda/Dv4gMggUWGib+/qz26H//UHjR7VKlPI2nrHasJDcO8sbp6C5DBFR0oo57xa/CCup384cnXUfKVJKZg6Hj5LuO17SOv/4Fuu/j06qkDh12/DLOKDqDJbVLnXbjbvtY7Vw51RBxAYt0dxT/ZxZBq8bLP7tR0P8sMTUAgI4PgGw2errFttw24EEVW6RH6OUPrnevUGU6u43PeWAeI0YtCzyJvUk/fLikH40lFEHcaW0wJcaana1Fel8LUqDe3rjmNIBHH8KvgUgSOzqj0WPUSVWZ3auR6Iw+Cs3CJ+6AojeXI+tnr2B3DyhEpJIe29GF0iAEXbud4qaR/cBcD+eLtFinzAeSLStVZMt2QuguQPk0bZrMPfY9ykAuZLmztd15unTlR8o9TxzCsSQy2CMTm9wnmMVXDI2wbxwrlFnxVed0Kh4mOZUGsJO6PNUSgs6L4LTz/RmC9DuXQLMZs+JR0i8QtWoGjExSdQxzMUUY0pkBtWilQ7M3zCxDr8RlryLH4A08rnC03TrJsZ9zWrG27c7j1kgXMRPKXOqirRh+AfOGo6a23mewoWjaaQNlATA8mGCy2WDpgkMWgN2V6TDUn0FTlPNppL06l6UC/fkEbk9PBBGz8eYb7ZRZ8fBjFNaSe3lNciEJW+9hi34UQ6F3L2Xa4zjYSAAxyjHIEkCjw0J+3uab04zCjr38EzECzk7J1LuJnzh8KFatTvvXFAw0imXZEFoybuuvBMZmKrPGLsowrsYtXJ5+H1PF1Gn0oeTUoLYqg==
  • Spamdiagnosticmetadata: NSPM
  • Spamdiagnosticoutput: 1:99

pscheduler-archiver seems to crash from time to time on on of my servers (the busiest one with more tests than the rest of servers).

 

When I ran

 

pscheduler task rtt --dest desthost --source srcip

 

Submitting task...

Task URL:

https://srcip/pscheduler/tasks/8bc9e897-2bb9-4a74-b48d-ec2e95d39c18

Running with tool 'ping'

Fetching first run...

Srcip 8 never scheduled a run for the task.

 

Pscheduler-archiver was “runing”.

 

If I restart pscheuler-archiver, the above command started to work again.  It appears that pscheduler-archive is the trouble maker, is there anything that I can look to isolate this problem?

 

Thank you.

 

Zhi-Wei Lu

IET-CR-Network Operations Center

University of California, Davis

(530) 752-0155

 

From: [mailto:] On Behalf Of Kathy Benninger
Sent: Friday, May 05, 2017 9:16 AM
To:
Subject: Re: [perfsonar-user] Debugging pscheduler

 

I built a server with a perfSONAR Toolkit v4.0.0.1 .iso from 17-Apr-17. Unfortunately, as others have reported, I can't seem to get pscheduler/testing to restart on its own after a reboot.

As Charley Kneifel reported,
    yum -y reinstall pscheduler-server
gets pscheduler working. Unfortunately, pscheduler still doesn't restart automatically after a reboot.

Alternatively,
    [root@ps-test benninge]# systemctl start pscheduler-archiver
gets things working without a reinstall (I don't know if it would have worked BEFORE the first time I did a "yum reinstall")

I've noticed after reboot that pscheduler-archiver.service doesn't start:

[root@ps-test benninge]# systemctl status pscheduler-archiver -l
● pscheduler-archiver.service - pScheduler server - archiver
   Loaded: loaded (/usr/lib/systemd/system/pscheduler-archiver.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2017-05-04 14:45:32 EDT; 9min ago
 Main PID: 1079 (code=exited, status=1/FAILURE)

May 04 14:45:32 ps-test.psc.edu archiver[1079]: File "/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in pg_connection
May 04 14:45:32 ps-test.psc.edu archiver[1079]: pg = psycopg2.connect(dsn)
May 04 14:45:32 ps-test.psc.edu archiver[1079]: File "/usr/lib64/python2.7/site-packages/psycopg2/__init__.py", line 164, in connect
May 04 14:45:32 ps-test.psc.edu archiver[1079]: conn = _connect(dsn, connection_factory=connection_factory, async=async)
May 04 14:45:32 ps-test.psc.edu archiver[1079]: psycopg2.OperationalError: could not connect to server: Connection refused
May 04 14:45:32 ps-test.psc.edu archiver[1079]: Is the server running on host "127.0.0.1" and accepting
May 04 14:45:32 ps-test.psc.edu archiver[1079]: TCP/IP connections on port 5432?
May 04 14:45:32 ps-test.psc.edu systemd[1]: pscheduler-archiver.service: main process exited, code=exited, status=1/FAILURE
May 04 14:45:32 ps-test.psc.edu systemd[1]: Unit pscheduler-archiver.service entered failed state.
May 04 14:45:32 ps-test.psc.edu systemd[1]: pscheduler-archiver.service failed.

and pscheduler-ticker.service doesn't look happy:

[root@ps-test benninge]# systemctl status pscheduler-ticker -l
● pscheduler-ticker.service - pScheduler server - ticker
   Loaded: loaded (/usr/lib/systemd/system/pscheduler-ticker.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2017-05-04 14:45:26 EDT; 17min ago
 Main PID: 1076 (ticker)
   CGroup: /system.slice/pscheduler-ticker.service
           └─1076 /usr/bin/python /usr/libexec/pscheduler/daemons/ticker --dsn @/etc/pscheduler/database/database-dsn

May 04 14:45:34 ps-test.psc.edu ticker[1076]: conn = pscheduler.pg_connection(dsn)
May 04 14:45:34 ps-test.psc.edu ticker[1076]: File "/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in pg_connection
May 04 14:45:34 ps-test.psc.edu ticker[1076]: pg = psycopg2.connect(dsn)
May 04 14:45:34 ps-test.psc.edu ticker[1076]: File "/usr/lib64/python2.7/site-packages/psycopg2/__init__.py", line 164, in connect
May 04 14:45:34 ps-test.psc.edu ticker[1076]: conn = _connect(dsn, connection_factory=connection_factory, async=async)
May 04 14:45:34 ps-test.psc.edu ticker[1076]: OperationalError: FATAL:  the database system is starting up
May 04 14:45:34 ps-test.psc.edu ticker[1076]: safe_run/ticker ERROR    Program threw an exception after 0:00:00.383300
May 04 14:45:34 ps-test.psc.edu ticker[1076]: safe_run/ticker ERROR    Exception: OperationalError: FATAL:  the database system is starting up

                                              Traceback (most recent call last):
                                                File "/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in safe_run
                                                  function()
                                                File "/usr/libexec/pscheduler/daemons/ticker", line 156, in <lambda>
pscheduler.safe_run(lambda: main_program())
                                                File "/usr/libexec/pscheduler/daemons/ticker", line 123, in main_program
                                                  db = pscheduler.pg_connection(dsn)
                                                File "/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in pg_connection
                                                  pg = psycopg2.connect(dsn)
                                                File "/usr/lib64/python2.7/site-packages/psycopg2/__init__.py", line 164, in connect
                                                  conn = _connect(dsn, connection_factory=connection_factory, async=async)
                                              OperationalError: FATAL:  the database system is starting up
May 04 14:45:34 ps-test.psc.edu ticker[1076]: safe_run/ticker ERROR    Waiting 1.75 seconds before restarting
May 04 14:45:35 ps-test.psc.edu ticker[1076]: safe_run/ticker ERROR    Restarting


Here is a set of messages from /var/log/pscheduler/pscheduler.log (also similar to what Charley reported):

May  5 11:37:02 ps-test journal: archiver INFO     Started
May  5 11:37:02 ps-test journal: scheduler INFO     Started
May  5 11:37:02 ps-test journal: runner INFO     Started
May  5 11:37:02 ps-test journal: safe_run/ticker ERROR    Program threw an exception after 0:00:00.001378
May  5 11:37:02 ps-test journal: safe_run/scheduler ERROR    Program threw an exception after 0:00:00.000339
May  5 11:37:02 ps-test journal: safe_run/runner ERROR    Program threw an exception after 0:00:00.000290
May  5 11:37:02 ps-test journal: safe_run/ticker ERROR    Exception: OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.1" and accepting#012#011TCP/IP connections on port 5432?#012#012Traceback (most
 recent call last):#012  File "/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in safe_run#012    function()#012  File "/usr/libexec/pscheduler/daemons/ticker", line 156, in <lambda>#012    pscheduler.safe_run(lambda: main_program())#012  Fil
e "/usr/libexec/pscheduler/daemons/ticker", line 123, in main_program#012    db = pscheduler.pg_connection(dsn)#012  File "/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in pg_connection#012    pg = psycopg2.connect(dsn)#012  File "/usr/lib64/pyt
hon2.7/site-packages/psycopg2/__init__.py", line 164, in connect#012    conn = _connect(dsn, connection_factory=connection_factory, async=async)#012OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.
1" and accepting#012#011TCP/IP connections on port 5432?
May  5 11:37:02 ps-test journal: safe_run/ticker ERROR    Waiting 0.25 seconds before restarting
May  5 11:37:02 ps-test journal: safe_run/scheduler ERROR    Exception: OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.1" and accepting#012#011TCP/IP connections on port 5432?#012#012Traceback (m
ost recent call last):#012  File "/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in safe_run#012    function()#012  File "/usr/libexec/pscheduler/daemons/scheduler", line 660, in <lambda>#012    pscheduler.safe_run(lambda: main_program())#01
2  File "/usr/libexec/pscheduler/daemons/scheduler", line 527, in main_program#012    pg = pscheduler.pg_connection(dsn)#012  File "/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in pg_connection#012    pg = psycopg2.connect(dsn)#012  File "/usr/
lib64/python2.7/site-packages/psycopg2/__init__.py", line 164, in connect#012    conn = _connect(dsn, connection_factory=connection_factory, async=async)#012OperationalError: could not connect to server: Connection refused#012#011Is the server running on host
"127.0.0.1" and accepting#012#011TCP/IP connections on port 5432?
May  5 11:37:02 ps-test journal: safe_run/scheduler ERROR    Waiting 0.25 seconds before restarting
May  5 11:37:02 ps-test journal: safe_run/runner ERROR    Exception: OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.1" and accepting#012#011TCP/IP connections on port 5432?#012#012Traceback (most
 recent call last):#012  File "/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in safe_run#012    function()#012  File "/usr/libexec/pscheduler/daemons/runner", line 909, in <lambda>#012    pscheduler.safe_run(lambda: main_program())#012  Fil
e "/usr/libexec/pscheduler/daemons/runner", line 719, in main_program#012    db = pscheduler.pg_connection(dsn)#012  File "/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in pg_connection#012    pg = psycopg2.connect(dsn)#012  File "/usr/lib64/pyt
hon2.7/site-packages/psycopg2/__init__.py", line 164, in connect#012    conn = _connect(dsn, connection_factory=connection_factory, async=async)#012OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.
1" and accepting#012#011TCP/IP connections on port 5432?
May  5 11:37:02 ps-test journal: safe_run/runner ERROR    Waiting 0.25 seconds before restarting
May  5 11:37:02 ps-test journal: safe_run/ticker ERROR    Restarting
May  5 11:37:02 ps-test journal: safe_run/scheduler ERROR    Restarting
May  5 11:37:02 ps-test journal: safe_run/runner ERROR    Restarting
May  5 11:37:02 ps-test journal: safe_run/scheduler ERROR    Program threw an exception after 0:00:00.000342
May  5 11:37:02 ps-test journal: safe_run/ticker ERROR    Program threw an exception after 0:00:00.001007
May  5 11:37:02 ps-test journal: safe_run/runner ERROR    Program threw an exception after 0:00:00.000317
May  5 11:37:02 ps-test journal: safe_run/scheduler ERROR    Exception: OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.1" and accepting#012#011TCP/IP connections on port 5432?#012#012Traceback (m
ost recent call last):#012  File "/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in safe_run#012    function()#012  File "/usr/libexec/pscheduler/daemons/scheduler", line 660, in <lambda>#012    pscheduler.safe_run(lambda: main_program())#01
2  File "/usr/libexec/pscheduler/daemons/scheduler", line 527, in main_program#012    pg = pscheduler.pg_connection(dsn)#012  File "/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in pg_connection#012    pg = psycopg2.connect(dsn)#012  File "/usr/
lib64/python2.7/site-packages/psycopg2/__init__.py", line 164, in connect#012    conn = _connect(dsn, connection_factory=connection_factory, async=async)#012OperationalError: could not connect to server: Connection refused#012#011Is the server running on host
"127.0.0.1" and accepting#012#011TCP/IP connections on port 5432?
May  5 11:37:02 ps-test journal: safe_run/scheduler ERROR    Waiting 0.5 seconds before restarting
May  5 11:37:02 ps-test journal: safe_run/runner ERROR    Exception: OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.1" and accepting#012#011TCP/IP connections on port 5432?#012#012Traceback (most
 recent call last):#012  File "/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in safe_run#012    function()#012  File "/usr/libexec/pscheduler/daemons/runner", line 909, in <lambda>#012    pscheduler.safe_run(lambda: main_program())#012  Fil
e "/usr/libexec/pscheduler/daemons/runner", line 719, in main_program#012    db = pscheduler.pg_connection(dsn)#012  File "/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in pg_connection#012    pg = psycopg2.connect(dsn)#012  File "/usr/lib64/pyt
hon2.7/site-packages/psycopg2/__init__.py", line 164, in connect#012    conn = _connect(dsn, connection_factory=connection_factory, async=async)#012OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.
1" and accepting#012#011TCP/IP connections on port 5432?
May  5 11:37:02 ps-test journal: safe_run/runner ERROR    Waiting 0.5 seconds before restarting
May  5 11:37:02 ps-test journal: safe_run/ticker ERROR    Exception: OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.1" and accepting#012#011TCP/IP connections on port 5432?#012#012Traceback (most
 recent call last):#012  File "/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in safe_run#012    function()#012  File "/usr/libexec/pscheduler/daemons/ticker", line 156, in <lambda>#012    pscheduler.safe_run(lambda: main_program())#012  Fil
e "/usr/libexec/pscheduler/daemons/ticker", line 123, in main_program#012    db = pscheduler.pg_connection(dsn)#012  File "/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in pg_connection#012    pg = psycopg2.connect(dsn)#012  File "/usr/lib64/pyt
hon2.7/site-packages/psycopg2/__init__.py", line 164, in connect#012    conn = _connect(dsn, connection_factory=connection_factory, async=async)#012OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.
1" and accepting#012#011TCP/IP connections on port 5432?

Thanks for any help!

Kathy


On 4/26/2017 5:50 PM, Charley Kneifel wrote:

I should have added that rerunning the pscheduler install "fixes" the problem (until the next reboot) and now pscheduler starts just fine -

 

[root@ps-node-e pscheduler]# yum reinstall pscheduler-server
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
 * Internet2: software.internet2.edu
 * base: repos-va.psychz.net
 * epel: mirror.metrocast.net
 * extras: mirror.es.its.nyu.edu
 * updates: ftpmirror.your.org
Resolving Dependencies
--> Running transaction check
---> Package pscheduler-server.noarch 0:1.0.0.1-1.el7.centos will be reinstalled
--> Finished Dependency Resolution

Dependencies Resolved

==============================================================================================================================
 Package                          Arch                  Version                                Repository                Size
==============================================================================================================================
Reinstalling:
 pscheduler-server                noarch                1.0.0.1-1.el7.centos                   Internet2                 82 k

Transaction Summary
==============================================================================================================================
Reinstall  1 Package

Total download size: 82 k
Installed size: 281 k
Is this ok [y/d/N]: ifconfig ens160 mtu 9000
Is this ok [y/d/N]: y
Downloading packages:
pscheduler-server-1.0.0.1-1.el7.centos.noarch.rpm                                                      |  82 kB  00:00:01
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction


  Installing : pscheduler-server-1.0.0.1-1.el7.centos.noarch                                                              1/1
  Verifying  : pscheduler-server-1.0.0.1-1.el7.centos.noarch                                                              1/1

Installed:
  pscheduler-server.noarch 0:1.0.0.1-1.el7.centos

Complete!

Apr 26 17:40:01 ps-node-e journal: runner INFO     Started
Apr 26 17:40:01 ps-node-e journal: archiver INFO     Started
Apr 26 17:40:01 ps-node-e journal: scheduler INFO     Started
Apr 26 17:40:02 ps-node-e journal: runner INFO     37521: Running https://ps-node-e.kneifel.local/pscheduler/tasks/f84ff3ed-b6de-454a-b67d-bb501c85a6d8/runs/a6127b24-b16f-4fbc-aaca-af153459d617
Apr 26 17:40:02 ps-node-e journal: runner INFO     37521: With powstream: latencybg --data-ports 8760-9960 --dest 192.168.202.88 --packet-padding 0 --flip --source ps-node-b --ip-version 4 --packet-interval 0.1 --duration PT86400S --packet-count 600
Apr 26 17:40:02 ps-node-e journal: runner INFO     37505: Running https://ps-node-e.kneifel.local/pscheduler/tasks/c5616796-171d-421c-9552-5bc9d27c835a/runs/ea6234dc-fc28-4e8c-94fa-2913362f0ce4
Apr 26 17:40:02 ps-node-e journal: runner INFO     37505: With powstream: latencybg --data-ports 8760-9960 --dest 192.168.202.88 --packet-padding 0 --flip --source ps-node-c --ip-version 4 --packet-interval 0.1 --duration PT86400S --packet-count 600
Apr 26 17:40:02 ps-node-e journal: runner INFO     37509: Running https://ps-node-e.kneifel.local/pscheduler/tasks/d202293b-2a1a-4704-a28a-3842781a6084/runs/a6bb94f1-7f58-4085-8c50-efaa2cdf2f14
Apr 26 17:40:02 ps-node-e journal: runner INFO     37509: With powstream: latencybg --data-ports 8760-9960 --dest ps-node-c --packet-padding 0 --source 192.168.202.88 --ip-version 4 --packet-interval 0.1 --duration PT86400S --packet-count 600
Apr 26 17:40:02 ps-node-e journal: runner INFO     37525: Running https://ps-node-e.kneifel.local/pscheduler/tasks/621549b6-50dd-479e-af4f-04a6ae954f67/runs/69839120-f9fc-4ab6-89e6-b376297f8cb9
Apr 26 17:40:02 ps-node-e journal: runner INFO     37525: With powstream: latencybg --data-ports 8760-9960 --dest ps-node-b --packet-padding 0 --source 192.168.202.88 --ip-version 4 --packet-interval 0.1 --duration PT86400S --packet-count 600
Apr 26 17:40:02 ps-node-e journal: runner INFO     37528: Running https://ps-node-e.kneifel.local/pscheduler/tasks/d78bf04b-e585-4556-807f-447bac14e9a9/runs/56cf30d8-ed90-41ab-a693-99a4315826a5
Apr 26 17:40:02 ps-node-e journal: runner INFO     37528: With powstream: latencybg --data-ports 8760-9960 --dest 192.168.202.88 --packet-padding 0 --flip --source ps-node-D --ip-version 4 --packet-interval 0.1 --duration PT86400S --packet-count 600
Apr 26 17:40:02 ps-node-e journal: runner INFO     37517: Running https://ps-node-e.kneifel.local/pscheduler/tasks/3233041f-2c43-45a4-a9ae-3b299d68d0e3/runs/ba8968da-7d8c-4190-a69a-6e35f1f5bfc3
Apr 26 17:40:02 ps-node-e journal: runner INFO     37517: With powstream: latencybg --data-ports 8760-9960 --dest ps-node-a --packet-padding 0 --source 192.168.202.88 --ip-version 4 --packet-interval 0.1 --duration PT86400S --packet-count 600
Apr 26 17:40:02 ps-node-e journal: runner INFO     37513: Running https://ps-node-e.kneifel.local/pscheduler/tasks/2ad455ca-14a8-4f70-b6ab-0c6506c0f077/runs/eb3c47b5-849e-4882-8ef2-e0f85b7fde67
Apr 26 17:40:02 ps-node-e journal: runner INFO     37513: With powstream: latencybg --data-ports 8760-9960 --dest 192.168.202.88 --packet-padding 0 --flip --source ps-node-a --ip-version 4 --packet-interval 0.1 --duration PT86400S --packet-count 600
Apr 26 17:40:02 ps-node-e journal: runner INFO     37532: Running https://ps-node-e.kneifel.local/pscheduler/tasks/a4469c36-5e9b-40be-ad1e-61f6beddca64/runs/a313f507-5b02-4fca-8ecf-ebf08b4ca639
Apr 26 17:40:02 ps-node-e journal: runner INFO     37532: With powstream: latencybg --data-ports 8760-9960 --dest ps-node-D --packet-padding 0 --source 192.168.202.88 --ip-version 4 --packet-interval 0.1 --duration PT86400S --packet-count 600
Apr 26 17:40:37 ps-node-e journal: runner INFO     40123: Running https://ps-node-e.kneifel.local/pscheduler/tasks/e541a236-ef39-4e01-901d-22638cb2120b/runs/afa179ab-1426-4008-aeb3-806627d278ce
Apr 26 17:40:37 ps-node-e journal: runner INFO     40123: With ping: rtt --count 10 --dest ps-node-a --interval PT1S --ip-version 4 --source 192.168.202.88 --length 1000 --ttl 255
Apr 26 17:40:42 ps-node-e journal: pscheduler-api INFO     Started
Apr 26 17:40:43 ps-node-e journal: pscheduler-api INFO     Limits loaded from /etc/pscheduler/limits.conf




From: on behalf of Charley Kneifel
Sent: Wednesday, April 26, 2017 5:20 PM
To: Zhi-Wei Lu;
Subject: Re: [perfsonar-user] Debugging pscheduler

 

I have had a similar problem - pscheduler will run properly after running the reinstall but then will stop working after a reboot.

 

This is from a vanilla ISO install onto a VM from the 4/21/17 release

 

after logging in I see the following

 

ABRT has detected 1 problem(s). For more info run: abrt-cli list --since 1493040255
[root@ps-node-e charley]# abrt-cli list --since 1493040255
id 42223fad7206adaf563a3c79556b12b8bb0b0b44
reason:         pool.py:152:__init__:ValueError: Number of processes must be at least 1
time:           Mon 24 Apr 2017 06:51:42 AM EDT
cmdline:        /usr/bin/python /usr/libexec/pscheduler/commands/../classes/tool/traceroute/run
package:        pscheduler-tool-traceroute-1.0-1.el7.centos
uid:            1000 (pscheduler)
count:          2
Directory:      /var/spool/abrt/Python-2017-04-24-06:51:42-9460
Reported:       https://retrace.fedoraproject.org/faf/reports/bthash/cf21d9b6825a907dfc058e2537978065e77bf78e

id 0423ef98e262f14f154da722888811992eb36fee
reason:         __init__.py:164:connect:OperationalError: could not connect to server: Connection refused
time:           Sun 23 Apr 2017 01:17:14 AM EDT
cmdline:        /usr/bin/python /usr/libexec/pscheduler/daemons/archiver --dsn @/etc/pscheduler/database/database-dsn
package:        pscheduler-server-1.0.0.1-1.el7.centos
uid:            1000 (pscheduler)
count:          3
Directory:      /var/spool/abrt/Python-2017-04-23-01:17:14-729
[root@ps-node-e charley]#

 

and the following types of errors in pscheduler.log after a reboot

 

Apr 26 17:18:17 ps-node-e journal: archiver INFO     Started
Apr 26 17:18:17 ps-node-e journal: scheduler INFO     Started
Apr 26 17:18:17 ps-node-e journal: runner INFO     Started
Apr 26 17:18:17 ps-node-e journal: safe_run/runner ERROR    Program threw an exception after 0:00:00.000110
Apr 26 17:18:17 ps-node-e journal: safe_run/scheduler ERROR    Program threw an exception after 0:00:00.001355
Apr 26 17:18:17 ps-node-e journal: safe_run/ticker ERROR    Program threw an exception after 0:00:00.003222
Apr 26 17:18:17 ps-node-e journal: safe_run/ticker ERROR    Exception: OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.1" and accepting#012#011TCP/IP connections on port 5432?#012#012Traceback (most recent call last):#012  File "/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in safe_run#012    function()#012  File "/usr/libexec/pscheduler/daemons/ticker", line 156, in <lambda>#012    pscheduler.safe_run(lambda: main_program())#012  File "/usr/libexec/pscheduler/daemons/ticker", line 123, in main_program#012    db = pscheduler.pg_connection(dsn)#012  File "/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in pg_connection#012    pg = psycopg2.connect(dsn)#012  File "/usr/lib64/python2.7/site-packages/psycopg2/__init__.py", line 164, in connect#012    conn = _connect(dsn, connection_factory=connection_factory, async=async)#012OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.1" and accepting#012#011TCP/IP connections on port 5432?
Apr 26 17:18:17 ps-node-e journal: safe_run/ticker ERROR    Waiting 0.25 seconds before restarting
Apr 26 17:18:17 ps-node-e journal: safe_run/scheduler ERROR    Exception: OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.1" and accepting#012#011TCP/IP connections on port 5432?#012#012Traceback (most recent call last):#012  File "/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in safe_run#012    function()#012  File "/usr/libexec/pscheduler/daemons/scheduler", line 660, in <lambda>#012    pscheduler.safe_run(lambda: main_program())#012  File "/usr/libexec/pscheduler/daemons/scheduler", line 527, in main_program#012    pg = pscheduler.pg_connection(dsn)#012  File "/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in pg_connection#012    pg = psycopg2.connect(dsn)#012  File "/usr/lib64/python2.7/site-packages/psycopg2/__init__.py", line 164, in connect#012    conn = _connect(dsn, connection_factory=connection_factory, async=async)#012OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.1" and accepting#012#011TCP/IP connections on port 5432?
Apr 26 17:18:17 ps-node-e journal: safe_run/scheduler ERROR    Waiting 0.25 seconds before restarting
Apr 26 17:18:17 ps-node-e journal: safe_run/runner ERROR    Exception: OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.1" and accepting#012#011TCP/IP connections on port 5432?#012#012Traceback (most recent call last):#012  File "/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in safe_run#012    function()#012  File "/usr/libexec/pscheduler/daemons/runner", line 909, in <lambda>#012    pscheduler.safe_run(lambda: main_program())#012  File "/usr/libexec/pscheduler/daemons/runner", line 719, in main_program#012    db = pscheduler.pg_connection(dsn)#012  File "/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in pg_connection#012    pg = psycopg2.connect(dsn)#012  File "/usr/lib64/python2.7/site-packages/psycopg2/__init__.py", line 164, in connect#012    conn = _connect(dsn, connection_factory=connection_factory, async=async)#012OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.1" and accepting#012#011TCP/IP connections on port 5432?


































































 

 


From: on behalf of Zhi-Wei Lu
Sent: Wednesday, April 26, 2017 4:58 PM
To:
Subject: [perfsonar-user] Debugging pscheduler

 

Hi all,

 

After perfsonar 4.0 upgrade, I struggled to get one of our perfsonar boxes working again (with help from Andrew Lake).  It worked for a short while, and I noticed that it appears that pscheduler is not working properly.

 

pscheduler task rtt --dest p0.noc.ucdavis.edu --source 128.120.80.78

Submitting task...

Task URL:

https://128.120.80.78/pscheduler/tasks/8f17174e-2ca7-47d8-b98e-dd96b1d01e2b

Running with tool 'ping'

Fetching first run...

128.120.80.78 never scheduled a run for the task.

 

Reverse the source and dest, test worked

 

pscheduler task rtt --source p0.noc.ucdavis.edu --dest 128.120.80.78

Submitting task...

Task URL:

https://p0.noc.ucdavis.edu/pscheduler/tasks/b569b99a-bed7-431c-8948-b34238cdcacf

Running with tool 'ping'

Fetching first run...

 

Next scheduled run:

https://p0.noc.ucdavis.edu/pscheduler/tasks/b569b99a-bed7-431c-8948-b34238cdcacf/runs/d13ba8e6-6fcd-4aa6-8873-b411e9fb99a7

Starts 2017-04-26T13:43:28-07:00 (~7 seconds)

Ends   2017-04-26T13:43:39-07:00 (~10 seconds)

Waiting for result...

 

1       melange-owamp-v4.noc.ucdavis.edu (128.120.80.78)  64 Bytes  TTL 60  RTT   0.3550 ms

2       melange-owamp-v4.noc.ucdavis.edu (128.120.80.78)  64 Bytes  TTL 60  RTT   0.3540 ms

3       melange-owamp-v4.noc.ucdavis.edu (128.120.80.78)  64 Bytes  TTL 60  RTT   0.3680 ms

4       melange-owamp-v4.noc.ucdavis.edu (128.120.80.78)  64 Bytes  TTL 60  RTT   0.3570 ms

5       melange-owamp-v4.noc.ucdavis.edu (128.120.80.78)  64 Bytes  TTL 60  RTT   0.3420 ms

 

0% Packet Loss  RTT Min/Mean/Max/StdDev = 0.342000/0.355000/0.368000/0.014000 ms

 

No further runs scheduled.

 

Bwping works for both directions, though.

 

The pscheduer-scheduler daemon is “running”.  I wonder how I can trace down the real problem behind this issue.

 

Thank you.

 

Zhi-Wei Lu

IET-CR-Network Operations Center

University of California, Davis

(530) 752-0155

 

 




Archive powered by MHonArc 2.6.19.

Top of Page