Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] MaDDash grid suddenly went down

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] MaDDash grid suddenly went down


Chronological Thread 
  • From: Hyojoon Kim <>
  • To: Andrew Lake <>
  • Cc: perfsonar-user <>
  • Subject: Re: [perfsonar-user] MaDDash grid suddenly went down
  • Date: Mon, 9 Apr 2018 17:39:13 +0000
  • Accept-language: en-US
  • Ironport-phdr: 9a23:vrxWGhbbgvxiqTwouLYhcM3/LSx+4OfEezUN459isYplN5qZr865bnLW6fgltlLVR4KTs6sC17KN9fi4EUU7or+5+EgYd5JNUxJXwe43pCcHRPC/NEvgMfTxZDY7FskRHHVs/nW8LFQHUJ2mPw6arXK99yMdFQviPgRpOOv1BpTSj8Oq3Oyu5pHfeQpFiCazbL9oMBm6sRjau9ULj4dlNqs/0AbCrGFSe+RRy2NoJFaTkAj568yt4pNt8Dletuw4+cJYXqr0Y6o3TbpDDDQ7KG81/9HktQPCTQSU+HQRVHgdnwdSDAjE6BH6WYrxsjf/u+Fg1iSWIdH6QLYpUjmk8qxlSgLniD0fOjA57m/Zl9BwgqxYrhKvpRN/wpLbb46OOfVkYq/deMkXSXZdUstTUSFKH4Oyb5EID+oEJetWrYn8p1wMrRu5AgmsAv7kxDpJhn/zwKY31OYhEQbB3AwmHNIDq2zUrM/0NKcUTe+60rXIzTLFb/9Mxzjy9ZXIfwknrPqRU7xwds/RxlMuFwPDlliQrYvlPzyP2uUNtGib8uxtXv+shW4/swx9vyajy8Q2hoXUm44Z1ErI+ThnzIs2P9G0VlN3bNC8HJdNqS2XNJF6Tt4/T212oio2178LtJChcCUE1Zgr3xDfZOKEfoSU5x/uUemcLSt4iX17fb+wnAq+/Ey8xuD5SMW50FdHoTRAn9bQrX8A0x/e586aQfVn5EihwyyA1wXL5+FEP080ka3bJoY7zL4ql5oTt1jMETHqmEXxiq+abEoq9Oam5Oj/Y7XmoIGTN5Nshw3jLKgjmdazDfklPgQQRWSX5Pqw2b758UHnXblHgOU6kqzDv5DbIcQbqLS5AwhQ0os76Ra/CDCm0NIbnXQcNFJFYhOHj471NF7QPPD4CvG/j06ynzh22vDKJqfhDYnVLnjfjLfheq5w61ZCxwopw9Bf/JVUCrcbL/LpQ0PxqcfYAQEjMwGvx+bnCc591p8FWW6RGKOZMaXSsUOW6eI1JemDepMVtCjnJ/c7+vHukCxxpVhIVK+lx5YIIFSxHfkud0yfbWvEj8xHF2oW6FkQVuvv3X2TVjxUYHD6fKM5rmU5D4WgDq/bXIGri7Gd2yH9E5FLMDMVQmuQGGvlIt3XE8wHbzifd5ds
  • Spamdiagnosticmetadata: NSPM
  • Spamdiagnosticoutput: 1:99

Hi Andy, 

When I hovered over a box, it showed "500 Internal Server Error: <h1>Server Error (500)</h1>” . As I furthered looked into Django and Cassandra log files, I have found that both Cassandra and Postgresql processes had died at some point. Restarting them resolved the problem. Thank you for pointing me to the right direction. 

I ran into another problem while trying to find the cause of dying processes, but it is *not* perfSONAR’s issue (Cassandra is reporting “not enough space” in disk). I’ll send a separate email to the mailing list  for this problem. 

Thanks, 
Joon


On Apr 9, 2018, at 12:07 PM, Andrew Lake <> wrote:

Hi,

I suspect the problem is with your archive and that maddash is just the messenger. What error is maddash reporting when you hover over one of the boxes? Also you may want to look for any error /var/log/esmond/django.log or /var/log/esmond/esmond.log. You may just want to try restarting cassandra as well because often times that clears things up. 

Thanks,
Andy




On April 9, 2018 at 11:56:32 AM, Hyojoon Kim () wrote:

Hi, 

Our perfSONAR MaDDash grid suddenly went down, and I’m not sure why. It is down since April 6, 2018. The perfSONAR MA machine is up and running, but the archive data itself seems unreachable. MaDDash and MA are in the same host machine.

<>Grid is down
Category:CONFIGURATION
Potential Solutions:
• If you just configured this grid in the mesh, you may just need to wait as it takes several hours for throughput data to populate (depending on the interval between tests)
• Verify maddash is configured properly. Look in the files under /var/log/maddash/ for any errors. Things to look for are incorrect paths to checks or connection errors.
• Verify that perfSONAR MeshConfig GUIAgent has run recently and you are looking at an accurate test mesh
• Verify that your measurement archive(s) are running
• Verify no firewall is blocking maddash from reaching your measurement archive(s)
• Verify your hosts are downloading the mesh configuration file and that there are tests defined in /etc/perfsonar/meshconfig-agent-tasks.conf
• Verify that perfsonar-meshconfig-agent is running ('/etc/init.d/perfsonar-meshconfig-agent status' or 'systemctl status perfsonar-meshconfig-agent')
• Verify your hosts are able to reach their configured measurement archive and that there are no errors in /var/log/perfsonar/meshconfig-agent.log

I’ve followed the “Potential Solutions” here, and I think the logs in “/var/log/maddash/maddash-server.log” gives an important hint. Could someone give me an idea why this happened and how to fix this? 

FYI,

===
* If you just configured this grid in the mesh, you may just need to wait as it takes several hours for throughput data to populate (depending on the interval between tests)
  — This Mesh and MaDDash have been running for years
* Verify maddash is configured properly. Look in the files under /var/log/maddash/ for any errors. Things to look for are incorrect paths to checks or connection errors.
  — I see following errors in /var/log/maddash/maddash-server.log
ERROR 2018-04-08 20:44:36,408 Error cleaning database An SQL data change is not permitted for a read-only connection, user or database.
ERROR 2018-04-08 20:44:36,410 Error executing EventCalendarJob: An SQL data change is not permitted for a read-only connection, user or database.
ERROR 2018-04-08 20:45:36,411 Error cleaning database An SQL data change is not permitted for a read-only connection, user or database.
ERROR 2018-04-08 20:45:36,413 Error executing EventCalendarJob: An SQL data change is not permitted for a read-only connection, user or database.
ERROR 2018-04-08 20:46:36,413 Error cleaning database An SQL data change is not permitted for a read-only connection, user or database.
ERROR 2018-04-08 20:46:36,415 Error executing EventCalendarJob: An SQL data change is not permitted for a read-only connection, user or database.
ERROR 2018-04-08 20:47:36,416 Error cleaning database An SQL data change is not permitted for a read-only connection, user or database.

* Verify that perfSONAR MeshConfig GUIAgent has run recently and you are looking at an accurate test mesh
  — Verified.
* Verify that your measurement archive(s) are running
  — The MA machine is up and running.
* Verify no firewall is blocking maddash from reaching your measurement archive(s)
  — There are in the same host. No configuration change was made.
* Verify your hosts are downloading the mesh configuration file and that there are tests defined in /etc/perfsonar/meshconfig-agent-tasks.conf
  — Verified 
* Verify that perfsonar-meshconfig-agent is running ('/etc/init.d/perfsonar-meshconfig-agent status' or 'systemctl status perfsonar-meshconfig-agent’)
  — Verified. 
* Verify your hosts are able to reach their configured measurement archive and that there are no errors in /var/log/perfsonar/meshconfig-agent.log
  — Verified
===

Thanks,
Joon 




Archive powered by MHonArc 2.6.19.

Top of Page