Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] Two pscheduler-archiver questions

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] Two pscheduler-archiver questions


Chronological Thread 
  • From: Andrew Lake <>
  • To: David Szydloski <>,
  • Subject: Re: [perfsonar-user] Two pscheduler-archiver questions
  • Date: Thu, 15 Mar 2018 14:49:14 -0400
  • Ironport-phdr: 9a23:uE07lBEZsesZUBrSE5Cpd51GYnF86YWxBRYc798ds5kLTJ78ociwAkXT6L1XgUPTWs2DsrQY07GQ6/iocFdDyK7JiGoFfp1IWk1NouQttCtkPvS4D1bmJuXhdS0wEZcKflZk+3amLRodQ56mNBXdrXKo8DEdBAj0OxZrKeTpAI7SiNm82/yv95HJbAhEmDSwbaluIBmqsA7cqtQYjYx+J6gr1xDHuGFIe+NYxWNpIVKcgRPx7dqu8ZBg7ipdpesv+9ZPXqvmcas4S6dYDCk9PGAu+MLrrxjDQhCR6XYaT24bjwBHAwnB7BH9Q5fxri73vfdz1SWGIcH7S60/VC+85Kl3VhDnlCYHNyY48G7JjMxwkLlbqw+lqxBm3oLYfJ2ZOP94c6jAf90VWHBBU95RWSNDDIOyaIQAAeQCM+hFsYfyu0ADogGiCQS2Hu7i0CNEi33w0KYn0+ohCwbG3Ak4Et8StnTbsc/1O7kcUOuoyqfH1zbDYO1L0jr68ofIdA0uoPGXUL1uasrd008vGB3ZjliJr4HuIjCb1vwVvmSG8eZtVvijhmA9pwx+vzSj3MYhh4rRio4L1lzJ8T91zYU1KNGiVkJ3f9CpHIFNuyyeNYZ7RN4pTXtytyYg0LIGvIa2fCgUx5QjwB7Sc/KHfJaG7x75UOaeOjN4iGhqeLK4mRa+6UmgyuviWcmoyFtGszRJn9rWun0DzRDe5dWLRuF880qiwTqP0hrc6uBAIUA6j6rbLJshz6YqmZoVrEvCHjT7l1vtjKOMcEUr5PSo5/z9Yrr6vp+cK5N0igbmP6Q2hMO/G+o4MhMJX2id4+u8zqTv/VDiQLpUlP07ia3ZsJHBJcQHva61HRVZ0ocl6xajETimytIYkmcbLF5bYh6IkZXmO0ydaMz/WNu2nVWg2Ax51vzCOabiA92ZImPRgbvleq1+6mZCyQYoxtlb6tRfDbRXZLrrV1X/r9veBwV8Lhe52c7mDslwzIUTRTjJD6OEY43Itlrdz+QjOeSTLKscuzu1f/Ej6+/GgGR/n1IBK/r6laALYWy1S6w1a36SZmDh15JYST8H





On March 15, 2018 at 12:59:23 PM, David Szydloski () wrote:

1) Troubleshooting " 400: Invalid JSON returned " error:

One of the hosts in my mesh is running tests and getting results just fine however its not returning any results to the central management database. Looking at /var/log/pscheduler/pscheduler.log I see 

Mar 15 16:39:23 netperf01-fra1 archiver DEBUG    1349313: Returned JSON from archiver: {u'retry': u'PT60S', u'succeeded': False, u'error': u'400: Invalid JSON returned'}
Mar 15 16:39:23 netperf01-fra1 archiver WARNING  1349313: Failed to archive https://localhost/pscheduler/tasks/52ee426a-3b0d-41df-9b85-e6782633c65f/runs/3a6cb757-b446-4193-bc9e-54c9d321f9ff to esmond: 400: Invalid JSON returned
Mar 15 16:39:23 netperf01-fra1 archiver DEBUG    1349313: Rescheduling for 2018-03-15 16:40:23.310899+00:00
Mar 15 16:39:23 netperf01-fra1 archiver DEBUG    1349313: Thread finished

On the central management VM, is see the following in /var/log/apache2/access.log pertaining to the host having the issues:

dszydloski@sonar-poc:/var/log/apache2$ tail access.log


10.71.8.28 - - [15/Mar/2018:16:44:32 +0000] "POST /esmond/perfsonar/archive/ HTTP/1.1" 201 7103 "-" "python-requests/2.9.1"
10.71.8.28 - - [15/Mar/2018:16:44:32 +0000] "POST /esmond/perfsonar/archive/ HTTP/1.1" 201 7103 "-" "python-requests/2.9.1"

So its reaching the remote server but not making any PUT requests--likely due to the archiver error I guess?

I've been racking my brain isolating local differences between my mesh node setups but haven't been able to find anything pertaining to this issue. Any ideas on what else I can look at to fix this?

Likely you need to restart cassandra on the archive host. The POST requests that seem to be working only hits postgresql, the PUT that is failing hits cassandra. If that doesn't work or you want to confirm this fact you can check the esmond logs on the measurement archive under /var/log/esmond/. In particular you probably have a long stack trace about having trouble connecting to cassandra.  



2) esmond "fast_mode"?

In the process of troubleshooting 1), I noticed some differences in my different mesh hosts. When I ran "cat /var/log/pscheduler/pscheduler.log | grep 'fast_mode is True'" across the hosts I had some hosts that showed 'fast_mode is True' in the logs but most didn't. It doesn't seem to matter in performance though i'm curious if I should spend time getting these all to report the same way or not.

fast_mode is just a tag in the log messages to indicate if it could find the metadata ID in the local memcached instance. It saves the archiver from having to do that initial POST you’re seeing in the logs which conserves a significant amount of CPU cycles. The metadata id is not cached if there is a storage failure, so probably fixing #1 will flip more of those to True.  





Thanks!
--
David Szydloski
Core Deployment Engineer
VidScale, Inc.



Archive powered by MHonArc 2.6.19.

Top of Page