Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] MadDash stopped working

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] MadDash stopped working


Chronological Thread 
  • From: Johann Hugo <>
  • To:
  • Subject: Re: [perfsonar-user] MadDash stopped working
  • Date: Wed, 10 May 2023 15:21:11 +0200

The ssl_access_log was before fixing esmond_manage add_user_ip_address list. Here is the correct ssl_access_log 

tail -F /var/log/httpd/ssl_access_log
155.232.40.197 - - [10/May/2023:15:17:01 +0200] "POST /esmond/perfsonar/archive/? HTTP/1.1" 201 5277
155.232.40.197 - - [10/May/2023:15:17:01 +0200] "PUT /esmond/perfsonar/archive/0170db1268854b82bf6e2d82be6beadc/? HTTP/1.1" 500 27
155.232.40.26 - - [10/May/2023:15:17:01 +0200] "POST /esmond/perfsonar/archive/? HTTP/1.1" 201 5274
155.232.40.26 - - [10/May/2023:15:17:01 +0200] "PUT /esmond/perfsonar/archive/fe92651a229a4b50a432b3a7274a5a7a/? HTTP/1.1" 500 27
155.232.40.203 - - [10/May/2023:15:17:02 +0200] "POST /esmond/perfsonar/archive/? HTTP/1.1" 201 5277
155.232.40.197 - - [10/May/2023:15:17:02 +0200] "POST /esmond/perfsonar/archive/? HTTP/1.1" 201 5276
155.232.40.203 - - [10/May/2023:15:17:02 +0200] "PUT /esmond/perfsonar/archive/9e69e9c0552b4c65bfae3160526cde7e/? HTTP/1.1" 500 27
155.232.40.197 - - [10/May/2023:15:17:02 +0200] "PUT /esmond/perfsonar/archive/c645cb251851456fb8db7401f9fea6ff/? HTTP/1.1" 500 27

On Wed, May 10, 2023 at 2:59 PM Johann Hugo <> wrote:
Hi all

Our MaDDash recently stopped working, everything orange

Installed MadDash version
perfsonar-centralmanagement.noarch     4.4.4-1.el7
perfsonar-psconfig-maddash.noarch      4.4.4-1.el7

The only problem I could find was with the postgresql database. I deleted the database + created a clean DB, but still orange
systemctl stop postgresql-10.service
cd /var/lib/pgsql/10/
mv data data.old
/usr/pgsql-10/bin/postgresql-10-setup initdb
/usr/lib/esmond-database/configure-pgsql.sh 10
rm -rf data.old/
systemctl start postgresql-10.service
/usr/lib/perfsonar/scripts/system_environment/configure_esmond --force

MadDash log files looks fine
tail -F /var/log/maddash/maddash-server.log
INFO 2023-04-29 00:00:00,004 oldestAllowedTime=1682114400
INFO 2023-04-29 12:00:00,013 oldestAllowedTime=1682157600
INFO 2023-04-30 00:00:00,002 oldestAllowedTime=1682200800


tail -F /var/log/maddash/psconfig-maddash-agent.log
2023/05/10 11:16:00 INFO pid=1823 prog=perfSONAR_PS::PSConfig::MaDDash::Agent::_run_handle_psconfig line=395 guid=FF984770-EF12-11ED-B27C-166B81FE1034 task_name=SANReN: SANReN: perfSONAR network - SANReN_South_Throughput config_url=http://perf-pwa.sanren.ac.za/pub/config/sanren_mesh_2019.json config_src=remote viz_type=ps-graphs check_type=ps-nagios-throughput grid_name=SANReN: SANReN: perfSONAR network - SANReN South Throughput - Throughput msg=Grid added
2023/05/10 11:16:01 INFO pid=1823 prog=perfSONAR_PS::PSConfig::MaDDash::Agent::_save_maddash_yaml line=1036 guid=FF984770-EF12-11ED-B27C-166B81FE1034 maddash_yaml_file=/etc/maddash/maddash-server/maddash.yaml msg=Successfully generated a new MaDDash configuration
2023/05/10 11:16:01 INFO pid=1823 prog=main:: line=178 guid=FF984770-EF12-11ED-B27C-166B81FE1034 msg=Agent completed running
2023/05/10 11:16:01 INFO pid=1823 prog=main:: line=186 guid=FF984770-EF12-11ED-B27C-166B81FE1034 msg=Time until next record refresh is 3478 seconds


It's reading the confings from the PWA server
psconfig maddash-stats
Agent Last Run Start Time: 2023/05/10 11:13:59
Agent Last Run End Time: 2023/05/10 11:16:01
Agent Last Run Process ID (PID): 1823
Agent Last Run Log GUID: FF984770-EF12-11ED-B27C-166B81FE1034
Total grids managed by agent: 12
From remote definitions: 12
    http://perf-pwa.sanren.ac.za/pub/config/sanren_mesh_2019.json: 12

Lookup services look fine
tail -F /var/log/perfsonar/lsregistrationdaemon.log
2023/05/10 12:34:04 (1881) INFO> Base.pm:377 perfSONAR_PS::LSRegistrationDaemon::Base::bulk_refresh - Calling bulk_keepalive()
2023/05/10 12:34:05 (1881) INFO> Base.pm:419 perfSONAR_PS::LSRegistrationDaemon::Base::bulk_keepalive - Bulk keepalive succeeded.
2023/05/10 12:34:05 (1881) INFO> Base.pm:666 perfSONAR_PS::LSRegistrationDaemon::Base::update_key - Updated key: lookup/host/ce45ef7a-7842-4d72-beec-402ead5f3665with refresh1683718745
2023/05/10 12:34:05 (1881) INFO> Base.pm:483 perfSONAR_PS::LSRegistrationDaemon::Base::_handle_bulk_update_success - Service renewed. Next Refresh: 1683718745(key=lookup/host/ce45ef7a-7842-4d72-beec-402ead5f3665, description=155.232.195.55)
2023/05/10 12:34:05 (1881) INFO> Base.pm:666 perfSONAR_PS::LSRegistrationDaemon::Base::update_key - Updated key: lookup/service/49e76fbf-dee5-48cd-93d2-763c29556fc8with refresh1683718745
2023/05/10 12:34:05 (1881) INFO> Base.pm:483 perfSONAR_PS::LSRegistrationDaemon::Base::_handle_bulk_update_success - Service renewed. Next Refresh: 1683718745(key=lookup/service/49e76fbf-dee5-48cd-93d2-763c29556fc8, description=pScheduler)
2023/05/10 12:34:05 (1881) INFO> Base.pm:666 perfSONAR_PS::LSRegistrationDaemon::Base::update_key - Updated key: lookup/service/36d34f45-d3fd-49c7-a5f5-953aff3366f4with refresh1683718745
2023/05/10 12:34:05 (1881) INFO> Base.pm:483 perfSONAR_PS::LSRegistrationDaemon::Base::_handle_bulk_update_success - Service renewed. Next Refresh: 1683718745(key=lookup/service/36d34f45-d3fd-49c7-a5f5-953aff3366f4, description=Measurement Archive)


I can see the test-points submitting there results
tail -F /var/log/httpd/ssl_access_log
155.232.50.1 - - [10/May/2023:13:24:06 +0200] "POST /esmond/perfsonar/archive/? HTTP/1.1" 401 58
155.232.40.173 - - [10/May/2023:13:24:06 +0200] "POST /esmond/perfsonar/archive/? HTTP/1.1" 401 58
155.232.50.1 - - [10/May/2023:13:24:06 +0200] "POST /esmond/perfsonar/archive/? HTTP/1.1" 401 58
155.232.40.90 - - [10/May/2023:13:24:06 +0200] "POST /esmond/perfsonar/archive/? HTTP/1.1" 401 58
155.232.40.203 - - [10/May/2023:13:24:07 +0200] "POST /esmond/perfsonar/archive/? HTTP/1.1" 401 58
155.232.40.197 - - [10/May/2023:13:24:08 +0200] "POST /esmond/perfsonar/archive/? HTTP/1.1" 401 58


/usr/sbin/esmond_manage list_user_ip_address
Could not connect to any of [('::1', 9160, 0, 0), ('127.0.0.1', 9160)]
sanren 192.33.10.253/32
sanren 192.33.10.254/32
sanren 155.232.40.0/24
sanren 155.232.195.0/24
sanren 155.232.21.202/32
sanren 155.232.50.0/24
sanren 170.39.8.0/24


Cassandra looks fine
tail -F /var/log/cassandra/cassandra.log
        at org.apache.cassandra.io.sstable.SSTableReader.openMetadata(SSTableReader.java:241)
        at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:204)
        at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:193)
        at org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:280)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
 INFO 14:43:55,826 Opening /var/lib/cassandra/data/esmond/raw_data/esmond-raw_data-jb-65184 (168418794 bytes)


Esmond is running
[root@perf-cm ~]# tail -F /var/log/esmond/esmond.log
2023-05-03 00:30:07,079 [INFO] /usr/lib/esmond/esmond/cassandra.py: Checking/creating column families
2023-05-03 00:30:07,081 [INFO] /usr/lib/esmond/esmond/cassandra.py: Schema check done
2023-05-03 00:30:07,094 [INFO] /usr/lib/esmond/esmond/cassandra.py: Connected to ['localhost:9160']


Django is not happy
tail -F /var/log/esmond/django.log
  Traceback (most recent call last):
  File "/usr/lib/esmond/lib/python3.6/site-packages/django/core/handlers/exception.py", line 41, in inner
    response = get_response(request)
  File "/usr/lib/esmond/lib/python3.6/site-packages/django/core/handlers/base.py", line 249, in _legacy_get_response
    response = self._get_response(request)
  File "/usr/lib/esmond/lib/python3.6/site-packages/django/core/handlers/base.py", line 187, in _get_response
    response = self.process_exception_by_middleware(e, request)
  File "/usr/lib/esmond/lib/python3.6/site-packages/django/core/handlers/base.py", line 185, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/lib/esmond/lib/python3.6/site-packages/django/views/decorators/csrf.py", line 58, in wrapped_view
    return view_func(*args, **kwargs)
  File "/usr/lib/esmond/lib/python3.6/site-packages/rest_framework/viewsets.py", line 116, in view
    return self.dispatch(request, *args, **kwargs)
  File "/usr/lib/esmond/lib/python3.6/site-packages/rest_framework/views.py", line 495, in dispatch
    response = self.handle_exception(exc)
  File "/usr/lib/esmond/lib/python3.6/site-packages/rest_framework/views.py", line 455, in handle_exception
    self.raise_uncaught_exception(exc)
  File "/usr/lib/esmond/lib/python3.6/site-packages/rest_framework/views.py", line 492, in dispatch
    response = handler(request, *args, **kwargs)
  File "/usr/lib/esmond/esmond/api/perfsonar/api_v2.py", line 921, in update
    check_connection()
  File "/usr/lib/esmond/esmond/api/perfsonar/api_v2.py", line 75, in check_connection
    db = CASSANDRA_DB(get_config(get_config_path()))
  File "/usr/lib/esmond/esmond/cassandra.py", line 135, in __init__
    "at %s - %s" % (config.cassandra_servers[0], e))
esmond.cassandra.ConnectionException: "System Manager can't connect to Cassandra at localhost:9160 - Could not connect to any of [('::1', 9160, 0, 0), ('127.0.0.1', 9160)]"

Any ideas ?

Thanks,
Johann

--
SANReN Engineer
South African National Research Network (SANReN)
National Integrated Cyber Infrastructure System (NICIS)
CSIR NextGen Enterprises and Institutions Cluster

Office: 012 841 2066Email: Website: www.sanren.ac.za / www.csir.co.za




Archive powered by MHonArc 2.6.24.

Top of Page