perfsonar-user - Re: [perfsonar-user] Debugging pscheduler
Subject: perfSONAR User Q&A and Other Discussion
List archive
- From: Charley Kneifel <>
- To: Zhi-Wei Lu <>, "" <>
- Subject: Re: [perfsonar-user] Debugging pscheduler
- Date: Wed, 26 Apr 2017 21:50:08 +0000
- Accept-language: en-US
- Authentication-results: oit.duke.edu; spf=none
- Authentication-results: ucdavis.edu; dkim=none (message not signed) header.d=none;ucdavis.edu; dmarc=none action=none header.from=duke.edu;
- Ironport-phdr: 9a23:a0267RACz6F4uUwqu7cQUyQJP3N1i/DPJgcQr6AfoPdwSP37rsqwAkXT6L1XgUPTWs2DsrQf2rSQ7/2rATVIyK3CmUhKSIZLWR4BhJdetC0bK+nBN3fGKuX3ZTcxBsVIWQwt1Xi6NU9IBJS2PAWK8TW94jEIBxrwKxd+KPjrFY7OlcS30P2594HObwlSijewZbJ/IA+5oAjRucUanZZuIbstxxXUpXdFZ/5Yzn5yK1KJmBb86Maw/Jp9/ClVpvks6c1OX7jkcqohVbBXAygoPG4z5M3wqBnMVhCP6WcGUmUXiRVHHQ7I5wznU5jrsyv6su192DSGPcDzULs5Vyiu47ttRRT1jioMKjw3/3zNisFojKxVvg+vpwBxzYDXboGaNvt+cb/Sct4BX2VNQtpdWjZdDo+gb4YCCfcKM+ZCr4n6olsDtQWzBQm2C+P1zj9HmH/31rA/3eQnDw7G3QIgEMwKsH/Jq9j6KKcSXvqvzKnM0zrCb+la2TT76IfWcxAhr++DXbZrfMXM1EkvFB3Kgk+OpoP4IjOYz+IAuHWV4epnUOKgkW8nqwdprziy2sgjl5PGhpwUylDf7yl23ps6JcCgRE5jet6lEIdftyGbNodsRMMtXX1otD47yr0ApZ60YjIKyJI5yB7DdfCLaZaH7Q/+VOqJJzd5hG9ldKihiBmv7Eis0u39Vs6p0FZQtCZFlcPMtn8V2xDO9MSKSeFx8lq/1TqTzQze6f1ILVo2mKfYM5IhwbgwmYQSsUTNACD2nF/6jKuRdko44Oeo7/zrYrD4qZ+ALY90jR3xPb41msyjG+g4NBYBX3Sd9O+h17Pj5VX0TKhKg/EoiKXUvorWKdkYq6O9GQNZzIgu5henAzu40NkUh2QLIVNFdR6dkYTlJV7DLOr3APunnVigjTZmy+rIM7H/GJnBM3vOnKv7cbty6UNRzhc4w8pF6JJOEL4BJer+Wk/vu9zcCR80Kwm0zPzmCNV5z48eVmyODraEPK/Oq1OI4eMvI/WWaIAIpTb9KOYq6+TwgnAkhF8RZaip3Z0JZ3CkBvlqPkWUbWT2jtoPC2sGowkzQejwhFGcXzNeZm6+X6cm6TE6DIKmA53DRoeogLGZwCi7HodZZnpCCl+SEXbkbYaEW/ESZyKTLM9ujDgEWKO8RI881hGhqhX2xKR6IerJ4iEYr47s1MBp5+3PkhE/7SR0ANqH02GDVGF0mWUISCUs3KxmvEx90UmM0bJjjvxcFNxT/O9JUhwkOZLGzux6DczyVRzbftePVlmmXsuqDSsvQd0s3t9dK3p6Tueij1X80iviJ6Jdw66UAIEc76nQ1mbwPIBwx2uQkOFrl1Q8TNBIM2S8w7Nk+hL7BojVnl+fmrrwM6kQwWSFoH+O12SVu0dRSkttSqjfdXEZekbMq9nlvAXPQ6L4Wpo9NQ4U4s6LMK9He5XKi1hHSvrncIDVY2urn2qhQxWOy7KCYY7Cd3ge0WPQBFVSwFNbxmqPKQVrXnTpmGnZFjE7TV8=
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:99
I should have added that rerunning the pscheduler install "fixes" the problem (until the next reboot) and now pscheduler starts just fine -
[root@ps-node-e pscheduler]# yum reinstall pscheduler-server
Loaded plugins: fastestmirror, langpacks Loading mirror speeds from cached hostfile * Internet2: software.internet2.edu * base: repos-va.psychz.net * epel: mirror.metrocast.net * extras: mirror.es.its.nyu.edu * updates: ftpmirror.your.org Resolving Dependencies --> Running transaction check ---> Package pscheduler-server.noarch 0:1.0.0.1-1.el7.centos will be reinstalled --> Finished Dependency Resolution Dependencies Resolved ============================================================================================================================== Package Arch Version Repository Size ============================================================================================================================== Reinstalling: pscheduler-server noarch 1.0.0.1-1.el7.centos Internet2 82 k Transaction Summary ============================================================================================================================== Reinstall 1 Package Total download size: 82 k Installed size: 281 k Is this ok [y/d/N]: ifconfig ens160 mtu 9000 Is this ok [y/d/N]: y Downloading packages: pscheduler-server-1.0.0.1-1.el7.centos.noarch.rpm | 82 kB 00:00:01 Running transaction check Running transaction test Transaction test succeeded Running transaction Installing : pscheduler-server-1.0.0.1-1.el7.centos.noarch 1/1 Verifying : pscheduler-server-1.0.0.1-1.el7.centos.noarch 1/1 Installed: pscheduler-server.noarch 0:1.0.0.1-1.el7.centos Complete! Apr 26 17:40:01 ps-node-e journal: runner INFO Started
Apr 26 17:40:01 ps-node-e journal: archiver INFO Started Apr 26 17:40:01 ps-node-e journal: scheduler INFO Started Apr 26 17:40:02 ps-node-e journal: runner INFO 37521: Running https://ps-node-e.kneifel.local/pscheduler/tasks/f84ff3ed-b6de-454a-b67d-bb501c85a6d8/runs/a6127b24-b16f-4fbc-aaca-af153459d617 Apr 26 17:40:02 ps-node-e journal: runner INFO 37521: With powstream: latencybg --data-ports 8760-9960 --dest 192.168.202.88 --packet-padding 0 --flip --source ps-node-b --ip-version 4 --packet-interval 0.1 --duration PT86400S --packet-count 600 Apr 26 17:40:02 ps-node-e journal: runner INFO 37505: Running https://ps-node-e.kneifel.local/pscheduler/tasks/c5616796-171d-421c-9552-5bc9d27c835a/runs/ea6234dc-fc28-4e8c-94fa-2913362f0ce4 Apr 26 17:40:02 ps-node-e journal: runner INFO 37505: With powstream: latencybg --data-ports 8760-9960 --dest 192.168.202.88 --packet-padding 0 --flip --source ps-node-c --ip-version 4 --packet-interval 0.1 --duration PT86400S --packet-count 600 Apr 26 17:40:02 ps-node-e journal: runner INFO 37509: Running https://ps-node-e.kneifel.local/pscheduler/tasks/d202293b-2a1a-4704-a28a-3842781a6084/runs/a6bb94f1-7f58-4085-8c50-efaa2cdf2f14 Apr 26 17:40:02 ps-node-e journal: runner INFO 37509: With powstream: latencybg --data-ports 8760-9960 --dest ps-node-c --packet-padding 0 --source 192.168.202.88 --ip-version 4 --packet-interval 0.1 --duration PT86400S --packet-count 600 Apr 26 17:40:02 ps-node-e journal: runner INFO 37525: Running https://ps-node-e.kneifel.local/pscheduler/tasks/621549b6-50dd-479e-af4f-04a6ae954f67/runs/69839120-f9fc-4ab6-89e6-b376297f8cb9 Apr 26 17:40:02 ps-node-e journal: runner INFO 37525: With powstream: latencybg --data-ports 8760-9960 --dest ps-node-b --packet-padding 0 --source 192.168.202.88 --ip-version 4 --packet-interval 0.1 --duration PT86400S --packet-count 600 Apr 26 17:40:02 ps-node-e journal: runner INFO 37528: Running https://ps-node-e.kneifel.local/pscheduler/tasks/d78bf04b-e585-4556-807f-447bac14e9a9/runs/56cf30d8-ed90-41ab-a693-99a4315826a5 Apr 26 17:40:02 ps-node-e journal: runner INFO 37528: With powstream: latencybg --data-ports 8760-9960 --dest 192.168.202.88 --packet-padding 0 --flip --source ps-node-D --ip-version 4 --packet-interval 0.1 --duration PT86400S --packet-count 600 Apr 26 17:40:02 ps-node-e journal: runner INFO 37517: Running https://ps-node-e.kneifel.local/pscheduler/tasks/3233041f-2c43-45a4-a9ae-3b299d68d0e3/runs/ba8968da-7d8c-4190-a69a-6e35f1f5bfc3 Apr 26 17:40:02 ps-node-e journal: runner INFO 37517: With powstream: latencybg --data-ports 8760-9960 --dest ps-node-a --packet-padding 0 --source 192.168.202.88 --ip-version 4 --packet-interval 0.1 --duration PT86400S --packet-count 600 Apr 26 17:40:02 ps-node-e journal: runner INFO 37513: Running https://ps-node-e.kneifel.local/pscheduler/tasks/2ad455ca-14a8-4f70-b6ab-0c6506c0f077/runs/eb3c47b5-849e-4882-8ef2-e0f85b7fde67 Apr 26 17:40:02 ps-node-e journal: runner INFO 37513: With powstream: latencybg --data-ports 8760-9960 --dest 192.168.202.88 --packet-padding 0 --flip --source ps-node-a --ip-version 4 --packet-interval 0.1 --duration PT86400S --packet-count 600 Apr 26 17:40:02 ps-node-e journal: runner INFO 37532: Running https://ps-node-e.kneifel.local/pscheduler/tasks/a4469c36-5e9b-40be-ad1e-61f6beddca64/runs/a313f507-5b02-4fca-8ecf-ebf08b4ca639 Apr 26 17:40:02 ps-node-e journal: runner INFO 37532: With powstream: latencybg --data-ports 8760-9960 --dest ps-node-D --packet-padding 0 --source 192.168.202.88 --ip-version 4 --packet-interval 0.1 --duration PT86400S --packet-count 600 Apr 26 17:40:37 ps-node-e journal: runner INFO 40123: Running https://ps-node-e.kneifel.local/pscheduler/tasks/e541a236-ef39-4e01-901d-22638cb2120b/runs/afa179ab-1426-4008-aeb3-806627d278ce Apr 26 17:40:37 ps-node-e journal: runner INFO 40123: With ping: rtt --count 10 --dest ps-node-a --interval PT1S --ip-version 4 --source 192.168.202.88 --length 1000 --ttl 255 Apr 26 17:40:42 ps-node-e journal: pscheduler-api INFO Started Apr 26 17:40:43 ps-node-e journal: pscheduler-api INFO Limits loaded from /etc/pscheduler/limits.conf From: <> on behalf of Charley Kneifel <>
Sent: Wednesday, April 26, 2017 5:20 PM To: Zhi-Wei Lu; Subject: Re: [perfsonar-user] Debugging pscheduler I have had a similar problem - pscheduler will run properly after running the reinstall but then will stop working after a reboot.
This is from a vanilla ISO install onto a VM from the 4/21/17 release
after logging in I see the following ABRT has detected 1 problem(s). For more info run: abrt-cli list --since 1493040255
[root@ps-node-e charley]# abrt-cli list --since 1493040255 id 42223fad7206adaf563a3c79556b12b8bb0b0b44 reason: pool.py:152:__init__:ValueError: Number of processes must be at least 1 time: Mon 24 Apr 2017 06:51:42 AM EDT cmdline: /usr/bin/python /usr/libexec/pscheduler/commands/../classes/tool/traceroute/run package: pscheduler-tool-traceroute-1.0-1.el7.centos uid: 1000 (pscheduler) count: 2 Directory: /var/spool/abrt/Python-2017-04-24-06:51:42-9460 Reported: https://retrace.fedoraproject.org/faf/reports/bthash/cf21d9b6825a907dfc058e2537978065e77bf78e id 0423ef98e262f14f154da722888811992eb36fee reason: __init__.py:164:connect:OperationalError: could not connect to server: Connection refused time: Sun 23 Apr 2017 01:17:14 AM EDT cmdline: /usr/bin/python /usr/libexec/pscheduler/daemons/archiver --dsn @/etc/pscheduler/database/database-dsn package: pscheduler-server-1.0.0.1-1.el7.centos uid: 1000 (pscheduler) count: 3 Directory: /var/spool/abrt/Python-2017-04-23-01:17:14-729 [root@ps-node-e charley]# and the following types of errors in pscheduler.log after a reboot
Apr 26 17:18:17 ps-node-e journal: archiver INFO Started
Apr 26 17:18:17 ps-node-e journal: scheduler INFO Started Apr 26 17:18:17 ps-node-e journal: runner INFO Started Apr 26 17:18:17 ps-node-e journal: safe_run/runner ERROR Program threw an exception after 0:00:00.000110 Apr 26 17:18:17 ps-node-e journal: safe_run/scheduler ERROR Program threw an exception after 0:00:00.001355 Apr 26 17:18:17 ps-node-e journal: safe_run/ticker ERROR Program threw an exception after 0:00:00.003222 Apr 26 17:18:17 ps-node-e journal: safe_run/ticker ERROR Exception: OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.1" and accepting#012#011TCP/IP connections on port 5432?#012#012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in safe_run#012 function()#012 File "/usr/libexec/pscheduler/daemons/ticker", line 156, in <lambda>#012 pscheduler.safe_run(lambda: main_program())#012 File "/usr/libexec/pscheduler/daemons/ticker", line 123, in main_program#012 db = pscheduler.pg_connection(dsn)#012 File "/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in pg_connection#012 pg = psycopg2.connect(dsn)#012 File "/usr/lib64/python2.7/site-packages/psycopg2/__init__.py", line 164, in connect#012 conn = _connect(dsn, connection_factory=connection_factory, async=async)#012OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.1" and accepting#012#011TCP/IP connections on port 5432? Apr 26 17:18:17 ps-node-e journal: safe_run/ticker ERROR Waiting 0.25 seconds before restarting Apr 26 17:18:17 ps-node-e journal: safe_run/scheduler ERROR Exception: OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.1" and accepting#012#011TCP/IP connections on port 5432?#012#012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in safe_run#012 function()#012 File "/usr/libexec/pscheduler/daemons/scheduler", line 660, in <lambda>#012 pscheduler.safe_run(lambda: main_program())#012 File "/usr/libexec/pscheduler/daemons/scheduler", line 527, in main_program#012 pg = pscheduler.pg_connection(dsn)#012 File "/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in pg_connection#012 pg = psycopg2.connect(dsn)#012 File "/usr/lib64/python2.7/site-packages/psycopg2/__init__.py", line 164, in connect#012 conn = _connect(dsn, connection_factory=connection_factory, async=async)#012OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.1" and accepting#012#011TCP/IP connections on port 5432? Apr 26 17:18:17 ps-node-e journal: safe_run/scheduler ERROR Waiting 0.25 seconds before restarting Apr 26 17:18:17 ps-node-e journal: safe_run/runner ERROR Exception: OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.1" and accepting#012#011TCP/IP connections on port 5432?#012#012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line 41, in safe_run#012 function()#012 File "/usr/libexec/pscheduler/daemons/runner", line 909, in <lambda>#012 pscheduler.safe_run(lambda: main_program())#012 File "/usr/libexec/pscheduler/daemons/runner", line 719, in main_program#012 db = pscheduler.pg_connection(dsn)#012 File "/usr/lib/python2.7/site-packages/pscheduler/db.py", line 35, in pg_connection#012 pg = psycopg2.connect(dsn)#012 File "/usr/lib64/python2.7/site-packages/psycopg2/__init__.py", line 164, in connect#012 conn = _connect(dsn, connection_factory=connection_factory, async=async)#012OperationalError: could not connect to server: Connection refused#012#011Is the server running on host "127.0.0.1" and accepting#012#011TCP/IP connections on port 5432?
From: <> on behalf of Zhi-Wei Lu <>
Sent: Wednesday, April 26, 2017 4:58 PM To: Subject: [perfsonar-user] Debugging pscheduler Hi all,
After perfsonar 4.0 upgrade, I struggled to get one of our perfsonar boxes working again (with help from Andrew Lake). It worked for a short while, and I noticed that it appears that pscheduler is not working properly.
pscheduler task rtt --dest p0.noc.ucdavis.edu --source 128.120.80.78 Submitting task... Task URL: https://128.120.80.78/pscheduler/tasks/8f17174e-2ca7-47d8-b98e-dd96b1d01e2b Running with tool 'ping' Fetching first run... 128.120.80.78 never scheduled a run for the task.
Reverse the source and dest, test worked
pscheduler task rtt --source p0.noc.ucdavis.edu --dest 128.120.80.78 Submitting task... Task URL: https://p0.noc.ucdavis.edu/pscheduler/tasks/b569b99a-bed7-431c-8948-b34238cdcacf Running with tool 'ping' Fetching first run...
Next scheduled run: https://p0.noc.ucdavis.edu/pscheduler/tasks/b569b99a-bed7-431c-8948-b34238cdcacf/runs/d13ba8e6-6fcd-4aa6-8873-b411e9fb99a7 Starts 2017-04-26T13:43:28-07:00 (~7 seconds) Ends 2017-04-26T13:43:39-07:00 (~10 seconds) Waiting for result...
1 melange-owamp-v4.noc.ucdavis.edu (128.120.80.78) 64 Bytes TTL 60 RTT 0.3550 ms 2 melange-owamp-v4.noc.ucdavis.edu (128.120.80.78) 64 Bytes TTL 60 RTT 0.3540 ms 3 melange-owamp-v4.noc.ucdavis.edu (128.120.80.78) 64 Bytes TTL 60 RTT 0.3680 ms 4 melange-owamp-v4.noc.ucdavis.edu (128.120.80.78) 64 Bytes TTL 60 RTT 0.3570 ms 5 melange-owamp-v4.noc.ucdavis.edu (128.120.80.78) 64 Bytes TTL 60 RTT 0.3420 ms
0% Packet Loss RTT Min/Mean/Max/StdDev = 0.342000/0.355000/0.368000/0.014000 ms
No further runs scheduled.
Bwping works for both directions, though.
The pscheduer-scheduler daemon is “running”. I wonder how I can trace down the real problem behind this issue.
Thank you.
Zhi-Wei Lu IET-CR-Network Operations Center University of California, Davis (530) 752-0155
|
- [perfsonar-user] Debugging pscheduler, Zhi-Wei Lu, 04/26/2017
- Re: [perfsonar-user] Debugging pscheduler, Charley Kneifel, 04/26/2017
- Re: [perfsonar-user] Debugging pscheduler, Charley Kneifel, 04/26/2017
- RE: [perfsonar-user] Debugging pscheduler, Zhi-Wei Lu, 04/26/2017
- Re: [perfsonar-user] Debugging pscheduler, Charley Kneifel, 04/26/2017
- Re: [perfsonar-user] Debugging pscheduler, Charley Kneifel, 04/26/2017
Archive powered by MHonArc 2.6.19.