perfsonar-user - [perfsonar-user] MadDash stopped working
Subject: perfSONAR User Q&A and Other Discussion
List archive
- From: Johann Hugo <>
- To:
- Subject: [perfsonar-user] MadDash stopped working
- Date: Wed, 10 May 2023 14:59:46 +0200
Hi all
Our MaDDash recently stopped working, everything orange
--
Installed MadDash version
perfsonar-centralmanagement.noarch 4.4.4-1.el7
perfsonar-psconfig-maddash.noarch 4.4.4-1.el7
perfsonar-psconfig-maddash.noarch 4.4.4-1.el7
The only problem I could find was with the postgresql database. I deleted the database + created a clean DB, but still orange
systemctl stop postgresql-10.service
cd /var/lib/pgsql/10/
mv data data.old
/usr/pgsql-10/bin/postgresql-10-setup initdb
/usr/lib/esmond-database/configure-pgsql.sh 10
rm -rf data.old/
systemctl start postgresql-10.service
/usr/lib/perfsonar/scripts/system_environment/configure_esmond --force
Cassandra looks fine
at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:204)
at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:193)
at org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:280)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
INFO 14:43:55,826 Opening /var/lib/cassandra/data/esmond/raw_data/esmond-raw_data-jb-65184 (168418794 bytes)
response = get_response(request)
File "/usr/lib/esmond/lib/python3.6/site-packages/django/core/handlers/base.py", line 249, in _legacy_get_response
response = self._get_response(request)
File "/usr/lib/esmond/lib/python3.6/site-packages/django/core/handlers/base.py", line 187, in _get_response
response = self.process_exception_by_middleware(e, request)
File "/usr/lib/esmond/lib/python3.6/site-packages/django/core/handlers/base.py", line 185, in _get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/usr/lib/esmond/lib/python3.6/site-packages/django/views/decorators/csrf.py", line 58, in wrapped_view
return view_func(*args, **kwargs)
File "/usr/lib/esmond/lib/python3.6/site-packages/rest_framework/viewsets.py", line 116, in view
return self.dispatch(request, *args, **kwargs)
File "/usr/lib/esmond/lib/python3.6/site-packages/rest_framework/views.py", line 495, in dispatch
response = self.handle_exception(exc)
File "/usr/lib/esmond/lib/python3.6/site-packages/rest_framework/views.py", line 455, in handle_exception
self.raise_uncaught_exception(exc)
File "/usr/lib/esmond/lib/python3.6/site-packages/rest_framework/views.py", line 492, in dispatch
response = handler(request, *args, **kwargs)
File "/usr/lib/esmond/esmond/api/perfsonar/api_v2.py", line 921, in update
check_connection()
File "/usr/lib/esmond/esmond/api/perfsonar/api_v2.py", line 75, in check_connection
db = CASSANDRA_DB(get_config(get_config_path()))
File "/usr/lib/esmond/esmond/cassandra.py", line 135, in __init__
"at %s - %s" % (config.cassandra_servers[0], e))
esmond.cassandra.ConnectionException: "System Manager can't connect to Cassandra at localhost:9160 - Could not connect to any of [('::1', 9160, 0, 0), ('127.0.0.1', 9160)]"
cd /var/lib/pgsql/10/
mv data data.old
/usr/pgsql-10/bin/postgresql-10-setup initdb
/usr/lib/esmond-database/configure-pgsql.sh 10
rm -rf data.old/
systemctl start postgresql-10.service
/usr/lib/perfsonar/scripts/system_environment/configure_esmond --force
MadDash log files looks fine
tail -F /var/log/maddash/maddash-server.log
INFO 2023-04-29 00:00:00,004 oldestAllowedTime=1682114400
INFO 2023-04-29 12:00:00,013 oldestAllowedTime=1682157600
INFO 2023-04-30 00:00:00,002 oldestAllowedTime=1682200800
INFO 2023-04-29 00:00:00,004 oldestAllowedTime=1682114400
INFO 2023-04-29 12:00:00,013 oldestAllowedTime=1682157600
INFO 2023-04-30 00:00:00,002 oldestAllowedTime=1682200800
tail -F /var/log/maddash/psconfig-maddash-agent.log
2023/05/10 11:16:00 INFO pid=1823 prog=perfSONAR_PS::PSConfig::MaDDash::Agent::_run_handle_psconfig line=395 guid=FF984770-EF12-11ED-B27C-166B81FE1034 task_name=SANReN: SANReN: perfSONAR network - SANReN_South_Throughput config_url=http://perf-pwa.sanren.ac.za/pub/config/sanren_mesh_2019.json config_src=remote viz_type=ps-graphs check_type=ps-nagios-throughput grid_name=SANReN: SANReN: perfSONAR network - SANReN South Throughput - Throughput msg=Grid added
2023/05/10 11:16:01 INFO pid=1823 prog=perfSONAR_PS::PSConfig::MaDDash::Agent::_save_maddash_yaml line=1036 guid=FF984770-EF12-11ED-B27C-166B81FE1034 maddash_yaml_file=/etc/maddash/maddash-server/maddash.yaml msg=Successfully generated a new MaDDash configuration
2023/05/10 11:16:01 INFO pid=1823 prog=main:: line=178 guid=FF984770-EF12-11ED-B27C-166B81FE1034 msg=Agent completed running
2023/05/10 11:16:01 INFO pid=1823 prog=main:: line=186 guid=FF984770-EF12-11ED-B27C-166B81FE1034 msg=Time until next record refresh is 3478 seconds
2023/05/10 11:16:00 INFO pid=1823 prog=perfSONAR_PS::PSConfig::MaDDash::Agent::_run_handle_psconfig line=395 guid=FF984770-EF12-11ED-B27C-166B81FE1034 task_name=SANReN: SANReN: perfSONAR network - SANReN_South_Throughput config_url=http://perf-pwa.sanren.ac.za/pub/config/sanren_mesh_2019.json config_src=remote viz_type=ps-graphs check_type=ps-nagios-throughput grid_name=SANReN: SANReN: perfSONAR network - SANReN South Throughput - Throughput msg=Grid added
2023/05/10 11:16:01 INFO pid=1823 prog=perfSONAR_PS::PSConfig::MaDDash::Agent::_save_maddash_yaml line=1036 guid=FF984770-EF12-11ED-B27C-166B81FE1034 maddash_yaml_file=/etc/maddash/maddash-server/maddash.yaml msg=Successfully generated a new MaDDash configuration
2023/05/10 11:16:01 INFO pid=1823 prog=main:: line=178 guid=FF984770-EF12-11ED-B27C-166B81FE1034 msg=Agent completed running
2023/05/10 11:16:01 INFO pid=1823 prog=main:: line=186 guid=FF984770-EF12-11ED-B27C-166B81FE1034 msg=Time until next record refresh is 3478 seconds
It's reading the confings from the PWA server
psconfig maddash-stats
Agent Last Run Start Time: 2023/05/10 11:13:59
Agent Last Run End Time: 2023/05/10 11:16:01
Agent Last Run Process ID (PID): 1823
Agent Last Run Log GUID: FF984770-EF12-11ED-B27C-166B81FE1034
Total grids managed by agent: 12
From remote definitions: 12
http://perf-pwa.sanren.ac.za/pub/config/sanren_mesh_2019.json: 12
Agent Last Run Start Time: 2023/05/10 11:13:59
Agent Last Run End Time: 2023/05/10 11:16:01
Agent Last Run Process ID (PID): 1823
Agent Last Run Log GUID: FF984770-EF12-11ED-B27C-166B81FE1034
Total grids managed by agent: 12
From remote definitions: 12
http://perf-pwa.sanren.ac.za/pub/config/sanren_mesh_2019.json: 12
Lookup services look fine
tail -F /var/log/perfsonar/lsregistrationdaemon.log
2023/05/10 12:34:04 (1881) INFO> Base.pm:377 perfSONAR_PS::LSRegistrationDaemon::Base::bulk_refresh - Calling bulk_keepalive()
2023/05/10 12:34:05 (1881) INFO> Base.pm:419 perfSONAR_PS::LSRegistrationDaemon::Base::bulk_keepalive - Bulk keepalive succeeded.
2023/05/10 12:34:05 (1881) INFO> Base.pm:666 perfSONAR_PS::LSRegistrationDaemon::Base::update_key - Updated key: lookup/host/ce45ef7a-7842-4d72-beec-402ead5f3665with refresh1683718745
2023/05/10 12:34:05 (1881) INFO> Base.pm:483 perfSONAR_PS::LSRegistrationDaemon::Base::_handle_bulk_update_success - Service renewed. Next Refresh: 1683718745(key=lookup/host/ce45ef7a-7842-4d72-beec-402ead5f3665, description=155.232.195.55)
2023/05/10 12:34:05 (1881) INFO> Base.pm:666 perfSONAR_PS::LSRegistrationDaemon::Base::update_key - Updated key: lookup/service/49e76fbf-dee5-48cd-93d2-763c29556fc8with refresh1683718745
2023/05/10 12:34:05 (1881) INFO> Base.pm:483 perfSONAR_PS::LSRegistrationDaemon::Base::_handle_bulk_update_success - Service renewed. Next Refresh: 1683718745(key=lookup/service/49e76fbf-dee5-48cd-93d2-763c29556fc8, description=pScheduler)
2023/05/10 12:34:05 (1881) INFO> Base.pm:666 perfSONAR_PS::LSRegistrationDaemon::Base::update_key - Updated key: lookup/service/36d34f45-d3fd-49c7-a5f5-953aff3366f4with refresh1683718745
2023/05/10 12:34:05 (1881) INFO> Base.pm:483 perfSONAR_PS::LSRegistrationDaemon::Base::_handle_bulk_update_success - Service renewed. Next Refresh: 1683718745(key=lookup/service/36d34f45-d3fd-49c7-a5f5-953aff3366f4, description=Measurement Archive)
2023/05/10 12:34:04 (1881) INFO> Base.pm:377 perfSONAR_PS::LSRegistrationDaemon::Base::bulk_refresh - Calling bulk_keepalive()
2023/05/10 12:34:05 (1881) INFO> Base.pm:419 perfSONAR_PS::LSRegistrationDaemon::Base::bulk_keepalive - Bulk keepalive succeeded.
2023/05/10 12:34:05 (1881) INFO> Base.pm:666 perfSONAR_PS::LSRegistrationDaemon::Base::update_key - Updated key: lookup/host/ce45ef7a-7842-4d72-beec-402ead5f3665with refresh1683718745
2023/05/10 12:34:05 (1881) INFO> Base.pm:483 perfSONAR_PS::LSRegistrationDaemon::Base::_handle_bulk_update_success - Service renewed. Next Refresh: 1683718745(key=lookup/host/ce45ef7a-7842-4d72-beec-402ead5f3665, description=155.232.195.55)
2023/05/10 12:34:05 (1881) INFO> Base.pm:666 perfSONAR_PS::LSRegistrationDaemon::Base::update_key - Updated key: lookup/service/49e76fbf-dee5-48cd-93d2-763c29556fc8with refresh1683718745
2023/05/10 12:34:05 (1881) INFO> Base.pm:483 perfSONAR_PS::LSRegistrationDaemon::Base::_handle_bulk_update_success - Service renewed. Next Refresh: 1683718745(key=lookup/service/49e76fbf-dee5-48cd-93d2-763c29556fc8, description=pScheduler)
2023/05/10 12:34:05 (1881) INFO> Base.pm:666 perfSONAR_PS::LSRegistrationDaemon::Base::update_key - Updated key: lookup/service/36d34f45-d3fd-49c7-a5f5-953aff3366f4with refresh1683718745
2023/05/10 12:34:05 (1881) INFO> Base.pm:483 perfSONAR_PS::LSRegistrationDaemon::Base::_handle_bulk_update_success - Service renewed. Next Refresh: 1683718745(key=lookup/service/36d34f45-d3fd-49c7-a5f5-953aff3366f4, description=Measurement Archive)
I can see the test-points submitting there results
tail -F /var/log/httpd/ssl_access_log
155.232.50.1 - - [10/May/2023:13:24:06 +0200] "POST /esmond/perfsonar/archive/? HTTP/1.1" 401 58
155.232.40.173 - - [10/May/2023:13:24:06 +0200] "POST /esmond/perfsonar/archive/? HTTP/1.1" 401 58
155.232.50.1 - - [10/May/2023:13:24:06 +0200] "POST /esmond/perfsonar/archive/? HTTP/1.1" 401 58
155.232.40.90 - - [10/May/2023:13:24:06 +0200] "POST /esmond/perfsonar/archive/? HTTP/1.1" 401 58
155.232.40.203 - - [10/May/2023:13:24:07 +0200] "POST /esmond/perfsonar/archive/? HTTP/1.1" 401 58
155.232.40.197 - - [10/May/2023:13:24:08 +0200] "POST /esmond/perfsonar/archive/? HTTP/1.1" 401 58
155.232.50.1 - - [10/May/2023:13:24:06 +0200] "POST /esmond/perfsonar/archive/? HTTP/1.1" 401 58
155.232.40.173 - - [10/May/2023:13:24:06 +0200] "POST /esmond/perfsonar/archive/? HTTP/1.1" 401 58
155.232.50.1 - - [10/May/2023:13:24:06 +0200] "POST /esmond/perfsonar/archive/? HTTP/1.1" 401 58
155.232.40.90 - - [10/May/2023:13:24:06 +0200] "POST /esmond/perfsonar/archive/? HTTP/1.1" 401 58
155.232.40.203 - - [10/May/2023:13:24:07 +0200] "POST /esmond/perfsonar/archive/? HTTP/1.1" 401 58
155.232.40.197 - - [10/May/2023:13:24:08 +0200] "POST /esmond/perfsonar/archive/? HTTP/1.1" 401 58
/usr/sbin/esmond_manage list_user_ip_address
Could not connect to any of [('::1', 9160, 0, 0), ('127.0.0.1', 9160)]
sanren 192.33.10.253/32
sanren 192.33.10.254/32
sanren 155.232.40.0/24
sanren 155.232.195.0/24
sanren 155.232.21.202/32
sanren 155.232.50.0/24
sanren 170.39.8.0/24
Could not connect to any of [('::1', 9160, 0, 0), ('127.0.0.1', 9160)]
sanren 192.33.10.253/32
sanren 192.33.10.254/32
sanren 155.232.40.0/24
sanren 155.232.195.0/24
sanren 155.232.21.202/32
sanren 155.232.50.0/24
sanren 170.39.8.0/24
Cassandra looks fine
tail -F /var/log/cassandra/cassandra.log
at org.apache.cassandra.io.sstable.SSTableReader.openMetadata(SSTableReader.java:241)at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:204)
at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:193)
at org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:280)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
INFO 14:43:55,826 Opening /var/lib/cassandra/data/esmond/raw_data/esmond-raw_data-jb-65184 (168418794 bytes)
Esmond is running
[root@perf-cm ~]# tail -F /var/log/esmond/esmond.log
2023-05-03 00:30:07,079 [INFO] /usr/lib/esmond/esmond/cassandra.py: Checking/creating column families
2023-05-03 00:30:07,081 [INFO] /usr/lib/esmond/esmond/cassandra.py: Schema check done
2023-05-03 00:30:07,094 [INFO] /usr/lib/esmond/esmond/cassandra.py: Connected to ['localhost:9160']
2023-05-03 00:30:07,079 [INFO] /usr/lib/esmond/esmond/cassandra.py: Checking/creating column families
2023-05-03 00:30:07,081 [INFO] /usr/lib/esmond/esmond/cassandra.py: Schema check done
2023-05-03 00:30:07,094 [INFO] /usr/lib/esmond/esmond/cassandra.py: Connected to ['localhost:9160']
Django is not happy
tail -F /var/log/esmond/django.log
Traceback (most recent call last):
File "/usr/lib/esmond/lib/python3.6/site-packages/django/core/handlers/exception.py", line 41, in innerTraceback (most recent call last):
response = get_response(request)
File "/usr/lib/esmond/lib/python3.6/site-packages/django/core/handlers/base.py", line 249, in _legacy_get_response
response = self._get_response(request)
File "/usr/lib/esmond/lib/python3.6/site-packages/django/core/handlers/base.py", line 187, in _get_response
response = self.process_exception_by_middleware(e, request)
File "/usr/lib/esmond/lib/python3.6/site-packages/django/core/handlers/base.py", line 185, in _get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/usr/lib/esmond/lib/python3.6/site-packages/django/views/decorators/csrf.py", line 58, in wrapped_view
return view_func(*args, **kwargs)
File "/usr/lib/esmond/lib/python3.6/site-packages/rest_framework/viewsets.py", line 116, in view
return self.dispatch(request, *args, **kwargs)
File "/usr/lib/esmond/lib/python3.6/site-packages/rest_framework/views.py", line 495, in dispatch
response = self.handle_exception(exc)
File "/usr/lib/esmond/lib/python3.6/site-packages/rest_framework/views.py", line 455, in handle_exception
self.raise_uncaught_exception(exc)
File "/usr/lib/esmond/lib/python3.6/site-packages/rest_framework/views.py", line 492, in dispatch
response = handler(request, *args, **kwargs)
File "/usr/lib/esmond/esmond/api/perfsonar/api_v2.py", line 921, in update
check_connection()
File "/usr/lib/esmond/esmond/api/perfsonar/api_v2.py", line 75, in check_connection
db = CASSANDRA_DB(get_config(get_config_path()))
File "/usr/lib/esmond/esmond/cassandra.py", line 135, in __init__
"at %s - %s" % (config.cassandra_servers[0], e))
esmond.cassandra.ConnectionException: "System Manager can't connect to Cassandra at localhost:9160 - Could not connect to any of [('::1', 9160, 0, 0), ('127.0.0.1', 9160)]"
Any ideas ?
Thanks,
Johann
SANReN Engineer
South African National Research Network (SANReN)
National Integrated Cyber Infrastructure System (NICIS)
CSIR NextGen Enterprises and Institutions Cluster
Office: 012 841 2066Email: , Website: www.sanren.ac.za / www.csir.co.za
- [perfsonar-user] MadDash stopped working, Johann Hugo, 05/10/2023
- Re: [perfsonar-user] MadDash stopped working, Szymon Trocha, 05/10/2023
- Re: [perfsonar-user] MadDash stopped working, Johann Hugo, 05/10/2023
- Re: [perfsonar-user] MadDash stopped working, Johann Hugo, 05/10/2023
- Re: [perfsonar-user] MadDash stopped working, Szymon Trocha, 05/10/2023
Archive powered by MHonArc 2.6.24.