After upgrading the XSEDE perfSONAR at NICS (
ps.nics.xsede.org)
from CentOS6 to CentOS7-based Toolkit, the homepage stubbornly says
"No data available in table". The upgrade was done as a net install
last Thursday, 14-June.
I ran migrate-backup and migrate-restore with the --data option so
it should have both old and new accumulated data to display.
It had the "401: Invalid token" errors in
/var/log/perfsonar/pscheduler.log caused by a double set of three
"<measurement_archive>" blocks in
/etc/perfsonar/meshconfig-agent-tasks.conf so I fixed that. Sadly,
graphing still didn't start working.
ps.nics gets its test specs from a meshconfig file and all looks as
expected on the Test Configuration web page. pscheduler monitor
--host shows lots of tests coming and going. I can manually
initiate tests with pscheduler.
I don't see any obvious (at least to me) problem in
/var/log/messages.
Tried:
yum reinstall pscheduler-server
pscheduler internal db-update
The following is systemctl and log output that might help the
experts identify the problem/fix. Please let me know if other
evidence would be helpful.
Thanks!
Kathy Benninger
Pittsburgh Supercomputing Center
[root@ps benninge]#
systemctl status pscheduler-scheduler
-l
â pscheduler-scheduler.service - pScheduler server -
scheduler
Loaded: loaded
(/usr/lib/systemd/system/pscheduler-scheduler.service; enabled;
vendor preset: disabled)
Active: active (running) since Tue 2018-06-19 11:35:29
EDT; 2h 1min ago
Main PID: 2312 (scheduler)
CGroup:
/system.slice/pscheduler-scheduler.service
ââ2312
/usr/bin/python /usr/libexec/pscheduler/daemons/scheduler --daemon
--pid-file /var/run/pscheduler-scheduler.pid --dsn
@/etc/pscheduler/database/database-dsn
Jun 19 11:35:50
ps.nics.xsede.org scheduler[2312]:
safe_run/scheduler ERROR Program threw an
exception after 0:00:20.392415
Jun 19 11:35:50
ps.nics.xsede.org scheduler[2312]:
safe_run/scheduler ERROR Exception:
DatabaseError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line
72, in safe_run
function()
File "/usr/libexec/pscheduler/daemons/scheduler", line 809, in
<lambda>
pscheduler.safe_run(lambda: main_program())
File "/usr/libexec/pscheduler/daemons/scheduler", line 764, in
main_program
cursor.execute(query, args)
DatabaseError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Jun 19 11:35:50
ps.nics.xsede.org scheduler[2312]:
safe_run/scheduler ERROR Waiting 0.25 seconds
before restarting
Jun 19 11:35:50
ps.nics.xsede.org scheduler[2312]:
safe_run/scheduler ERROR Restarting:
['/usr/libexec/pscheduler/daemons/scheduler', '--daemon',
'--pid-file', '/var/run/pscheduler-scheduler.pid', '--dsn',
'@/etc/pscheduler/database/database-dsn']
Jun 19 11:35:50
ps.nics.xsede.org scheduler[2312]: scheduler
INFO Started
Jun 19 13:22:56
ps.nics.xsede.org scheduler[2312]:
safe_run/scheduler ERROR Program threw an
exception after 1:47:05.897896
Jun 19 13:22:56
ps.nics.xsede.org scheduler[2312]:
safe_run/scheduler ERROR Exception:
DatabaseError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line
72, in safe_run
function()
File "/usr/libexec/pscheduler/daemons/scheduler", line 809, in
<lambda>
pscheduler.safe_run(lambda: main_program())
File "/usr/libexec/pscheduler/daemons/scheduler", line 764, in
main_program
cursor.execute(query, args)
DatabaseError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Jun 19 13:22:56
ps.nics.xsede.org scheduler[2312]:
safe_run/scheduler ERROR Waiting 0.5 seconds
before restarting
Jun 19 13:22:57
ps.nics.xsede.org scheduler[2312]:
safe_run/scheduler ERROR Restarting:
['/usr/libexec/pscheduler/daemons/scheduler', '--daemon',
'--pid-file', '/var/run/pscheduler-scheduler.pid', '--dsn',
'@/etc/pscheduler/database/database-dsn']
Jun 19 13:22:57
ps.nics.xsede.org scheduler[2312]: scheduler
INFO Started
+++++++++++++++++++++++++++++++++++++++++++++
[root@ps benninge]#
systemctl status pscheduler-ticker
-l
â pscheduler-ticker.service - pScheduler server - ticker
Loaded: loaded
(/usr/lib/systemd/system/pscheduler-ticker.service; enabled; vendor
preset: disabled)
Active: active (running) since Tue 2018-06-19 11:35:29
EDT; 2h 3min ago
Main PID: 2317 (ticker)
CGroup: /system.slice/pscheduler-ticker.service
ââ2317
/usr/bin/python /usr/libexec/pscheduler/daemons/ticker --daemon
--pid-file /var/run/pscheduler-ticker.pid --dsn
@/etc/pscheduler/database/database-dsn
Jun 19 11:35:29
ps.nics.xsede.org systemd[1]: Starting pScheduler
server - ticker...
Jun 19 11:35:29
ps.nics.xsede.org systemd[1]: Started pScheduler
server - ticker.
Jun 19 11:35:50
ps.nics.xsede.org ticker[2317]: ticker
WARNING Queue maintainer got exception server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request....
Jun 19 11:36:01
ps.nics.xsede.org ticker[2317]: safe_run/ticker
ERROR Program threw an exception after
0:00:31.352763
Jun 19 11:36:01
ps.nics.xsede.org ticker[2317]: safe_run/ticker
ERROR Exception: OperationalError: terminating
connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally...
Jun 19 11:36:01
ps.nics.xsede.org ticker[2317]: safe_run/ticker
ERROR Waiting 0.25 seconds before
restarting
Jun 19 11:36:01
ps.nics.xsede.org ticker[2317]: safe_run/ticker
ERROR Restarting:
['/usr/libexec/pscheduler/daemons/ticker', '--daemon',
'--pid-file', '/var/run/pscheduler-t...tabase-dsn']
Hint: Some lines were ellipsized, use -l to show in full.
[root@ps benninge]# systemctl status pscheduler-ticker -l
â pscheduler-ticker.service - pScheduler server - ticker
Loaded: loaded
(/usr/lib/systemd/system/pscheduler-ticker.service; enabled; vendor
preset: disabled)
Active: active (running) since Tue 2018-06-19 11:35:29
EDT; 2h 3min ago
Main PID: 2317 (ticker)
CGroup: /system.slice/pscheduler-ticker.service
ââ2317
/usr/bin/python /usr/libexec/pscheduler/daemons/ticker --daemon
--pid-file /var/run/pscheduler-ticker.pid --dsn
@/etc/pscheduler/database/database-dsn
Jun 19 11:35:29
ps.nics.xsede.org systemd[1]: Starting pScheduler
server - ticker...
Jun 19 11:35:29
ps.nics.xsede.org systemd[1]: Started pScheduler
server - ticker.
Jun 19 11:35:50
ps.nics.xsede.org ticker[2317]: ticker
WARNING Queue maintainer got exception server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Jun 19 11:36:01
ps.nics.xsede.org ticker[2317]: safe_run/ticker
ERROR Program threw an exception after
0:00:31.352763
Jun 19 11:36:01
ps.nics.xsede.org ticker[2317]: safe_run/ticker
ERROR Exception: OperationalError: terminating
connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/pscheduler/saferun.py", line
72, in safe_run
function()
File "/usr/libexec/pscheduler/daemons/ticker", line 157, in
<lambda>
pscheduler.safe_run(lambda: main_program())
File "/usr/libexec/pscheduler/daemons/ticker", line 137, in
main_program
cursor.execute("SELECT ticker()")
OperationalError: terminating connection due to administrator
command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Jun 19 11:36:01
ps.nics.xsede.org ticker[2317]: safe_run/ticker
ERROR Waiting 0.25 seconds before
restarting
Jun 19 11:36:01
ps.nics.xsede.org ticker[2317]: safe_run/ticker
ERROR Restarting:
['/usr/libexec/pscheduler/daemons/ticker', '--daemon',
'--pid-file', '/var/run/pscheduler-ticker.pid', '--dsn',
'@/etc/pscheduler/database/database-dsn']
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Snippet from /var/log/esmond/django.log :
AllServersUnavailable: An attempt was made to connect to each of
the servers twice, but none of the attempts succeeded. The last
failure was TTransportException: Could not connect to
localhost:9160
2018-06-18 15:05:53,057 [ERROR]
/usr/lib/esmond/lib/python2.7/site-packages/django/core/handlers/exception.py:
Internal Server Error:
/esmond/perfsonar/archive/f47607965bb84e52a0fc715ab1ecd996/
Traceback (most recent call last):
File
"/usr/lib/esmond/lib/python2.7/site-packages/django/core/handlers/exception.py",
line 42, in inner
response = get_response(request)
File
"/usr/lib/esmond/lib/python2.7/site-packages/django/core/handlers/base.py",
line 249, in _legacy_get_response
response = self._get_response(request)
File
"/usr/lib/esmond/lib/python2.7/site-packages/django/core/handlers/base.py",
line 187, in _get_response
response =
self.process_exception_by_middleware(e, request)
File
"/usr/lib/esmond/lib/python2.7/site-packages/django/core/handlers/base.py",
line 185, in _get_response
response = wrapped_callback(request,
*callback_args, **callback_kwargs)
File
"/usr/lib/esmond/lib/python2.7/site-packages/django/views/decorators/csrf.py",
line 58, in wrapped_view
return view_func(*args, **kwargs)
File
"/usr/lib/esmond/lib/python2.7/site-packages/rest_framework/viewsets.py",
line 90, in view
return self.dispatch(request, *args,
**kwargs)
File
"/usr/lib/esmond/lib/python2.7/site-packages/rest_framework/views.py",
line 489, in dispatch
response = self.handle_exception(exc)
File
"/usr/lib/esmond/lib/python2.7/site-packages/rest_framework/views.py",
line 449, in handle_exception
self.raise_uncaught_exception(exc)
File
"/usr/lib/esmond/lib/python2.7/site-packages/rest_framework/views.py",
line 486, in dispatch
response = handler(request, *args,
**kwargs)
File "/usr/lib/esmond/esmond/api/perfsonar/api_v2.py", line
932, in update
obj.save()
File "/usr/lib/esmond/esmond/api/perfsonar/api_v2.py", line
407, in save
existing =
PSTimeSeriesObject.query_database(self.metadata_key,
self.event_type, 'base', None, int(self.time), int(self.time),
1)
File "/usr/lib/esmond/esmond/api/perfsonar/api_v2.py", line
473, in query_database
cf='average', ts_min=begin_millis,
ts_max=end_millis, column_count=max_results)
File "/usr/lib/esmond/esmond/cassandra.py", line 650, in
query_aggregation_timerange
column_count=cols)
File
"/usr/lib/esmond/lib/python2.7/site-packages/pycassa/columnfamily.py",
line 772, in multiget
packed_keys[offset:offset + buffer_size], cp,
sp, consistency)
File
"/usr/lib/esmond/lib/python2.7/site-packages/pycassa/pool.py", line
577, in execute
return getattr(conn, f)(*args, **kwargs)
File
"/usr/lib/esmond/lib/python2.7/site-packages/pycassa/pool.py", line
153, in new_f
return new_f(self, *args, **kwargs)
File
"/usr/lib/esmond/lib/python2.7/site-packages/pycassa/pool.py", line
153, in new_f
return new_f(self, *args, **kwargs)
File
"/usr/lib/esmond/lib/python2.7/site-packages/pycassa/pool.py", line
153, in new_f
return new_f(self, *args, **kwargs)
File
"/usr/lib/esmond/lib/python2.7/site-packages/pycassa/pool.py", line
153, in new_f
return new_f(self, *args, **kwargs)
File
"/usr/lib/esmond/lib/python2.7/site-packages/pycassa/pool.py", line
153, in new_f
return new_f(self, *args, **kwargs)
File
"/usr/lib/esmond/lib/python2.7/site-packages/pycassa/pool.py", line
153, in new_f
return new_f(self, *args, **kwargs)
File
"/usr/lib/esmond/lib/python2.7/site-packages/pycassa/pool.py", line
153, in new_f
return new_f(self, *args, **kwargs)
File
"/usr/lib/esmond/lib/python2.7/site-packages/pycassa/pool.py", line
153, in new_f
return new_f(self, *args, **kwargs)
File
"/usr/lib/esmond/lib/python2.7/site-packages/pycassa/pool.py", line
153, in new_f
return new_f(self, *args, **kwargs)
File
"/usr/lib/esmond/lib/python2.7/site-packages/pycassa/pool.py", line
125, in new_f
self._pool._replace_wrapper() # puts a new
wrapper in the queue
File
"/usr/lib/esmond/lib/python2.7/site-packages/pycassa/pool.py", line
458, in _replace_wrapper
conn = self._create_connection()
File
"/usr/lib/esmond/lib/python2.7/site-packages/pycassa/pool.py", line
431, in _create_connection
(exc.__class__.__name__, exc))
AllServersUnavailable: An attempt was made to connect to each of
the servers twice, but none of the attempts succeeded. The last
failure was TTransportException: Could not connect to
localhost:9160
2018-06-19 06:26:09,824 [WARNING]
/usr/lib/esmond/lib/python2.7/site-packages/django/core/handlers/base.py:
Not Found:
/esmond/perfsonar/archive/?event-type=histogram-owdelay&limit=1/
2018-06-19 06:29:37,671 [WARNING]
/usr/lib/esmond/lib/python2.7/site-packages/django/core/handlers/base.py:
Not Found:
/esmond/perfsonar/archive/?event-type=throughput&limit=1/
2018-06-19 09:59:12,691 [WARNING]
/usr/lib/esmond/lib/python2.7/site-packages/django/core/handlers/base.py:
Not Found:
/esmond/perfsonar/archive/?event-type=histogram-owdelay&limit=1/
2018-06-19 10:01:52,122 [WARNING]
/usr/lib/esmond/lib/python2.7/site-packages/django/core/handlers/base.py:
Not Found:
/esmond/perfsonar/archive/?event-type=throughput&limit=1/
2018-06-19 10:45:02,285 [WARNING]
/usr/lib/esmond/lib/python2.7/site-packages/django/core/handlers/base.py:
Not Found:
/esmond/perfsonar/archive/?event-type=histogram-owdelay&limit=1/
2018-06-19 10:48:02,119 [WARNING]
/usr/lib/esmond/lib/python2.7/site-packages/django/core/handlers/base.py:
Not Found:
/esmond/perfsonar/archive/?event-type=throughput&limit=1/
(END)
++++++++++++++++++++++++++++++++++++++++++++++++
--
To unsubscribe from this list:
https://lists.internet2.edu/sympa/signoff/perfsonar-user