Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] Two pscheduler-archiver questions

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] Two pscheduler-archiver questions


Chronological Thread 
  • From: David Szydloski <>
  • To: Andrew Lake <>
  • Cc:
  • Subject: Re: [perfsonar-user] Two pscheduler-archiver questions
  • Date: Mon, 19 Mar 2018 12:47:36 -0500
  • Ironport-phdr: 9a23:lLXMShDyfq3jiUVjjpCUUyQJP3N1i/DPJgcQr6AfoPdwSPvzrsbcNUDSrc9gkEXOFd2Cra4c0KyO6+jJYi8p2d65qncMcZhBBVcuqP49uEgeOvODElDxN/XwbiY3T4xoXV5h+GynYwAOQJ6tL1LdrWev4jEMBx7xKRR6JvjvGo7Vks+7y/2+94fcbglUijexe69+IAmrpgjNq8cahpdvJLwswRXTuHtIfOpWxWJsJV2Nmhv3+9m98p1+/SlOovwt78FPX7n0cKQ+VrxYES8pM3sp683xtBnMVhWA630BWWgLiBVIAgzF7BbnXpfttybxq+Rw1DWGMcDwULs5Xymp4aV2Rx/ykCoINTA5/mHZhMJzkaxVvg6uqgdlzILIeoyYLuZycr/fcN4cWGFPXtxRVytEAo6kYYUAFfQBPedFoILgulUPrBu/ChSoBOPo0T9IhX721rA93us8CgHH3QwhEM8KsHTTt9X1N6YSXPurzKnJ0DrDdO9Z1i356IfWaBwuvfaMXbdpfMfX1EIhFBvFg02OpYP/Iz+ZyuEAv3WU4udgSe6ijmEqpx1trjezw8ojlpXFiZ4Qx13B6yl0zps5KNulQ0Bhe9GkCoFftySCOot2XMwiR2ZotT4/yrIcuJ67eDEGyZo9xxLDcvCGfJaE7xz+WOqLLjd4g3VleL27hxms60Sv1ur8Vsys3FZLqCpKjMXMu2gT2xPJ9sSLVv5w8lqu1DuPywze7+5JLVwomaXHLpMu3qI8mYYWvEnGACP7llj6gLeTdko+++io7+rnYq/hpp+ZL4J7lxr+M6Uum8OiBeQ4NRMOX2ua+emnyr3j/FD2QK5WgfIslqnWrorWKtgcpq68GwNVyJos6w6jDze619QVhWEHI0xfeBKJlIjpPFfOL+riDfumnlSsiylkx+rdM73lA5XNNWTDkKz/cbpn6k5czhYzws5F55JSFL4BPOz/VlXvu9PFEx9qezCzls/hBM9wycs6UGGCSvuQNq/DmVKToOQiP7/fSpUSvWPRIuYk67bVn241nVgGfqjhiZcMemG0GvN9IESxf3vgmdwIFWpMtQ07Gr+5wGaeWCJeMi7hF5k34Ss2Xdqr

Andy, 

Thanks so much for taking time to help! Unfortunately, I still haven't been able to resolve the issue.

Regarding my persistant "400:Invalid JSON returned" error:
- Restarting cassandra on the central VM and the local archiver host has not fixed the issue. 
- On my local archiver host, I don't have any entries in /var/log/esmond/django.log since 2/23/2018 and the esmond.log file is empty. On the central VM, I only see cassandra errors in /var/log/esmond/django.log that correspond with me restarting the cassandra service on that host.
- I've rebooted the VM that is having the issue as well


Any other suggestions on how to proceed here? I'm still working on the issue myself though at this point I might just try a re-install.

Thanks again for your help,
David



On Thu, Mar 15, 2018 at 1:49 PM, Andrew Lake <> wrote:




On March 15, 2018 at 12:59:23 PM, David Szydloski () wrote:

1) Troubleshooting " 400: Invalid JSON returned " error:

One of the hosts in my mesh is running tests and getting results just fine however its not returning any results to the central management database. Looking at /var/log/pscheduler/pscheduler.log I see 

Mar 15 16:39:23 netperf01-fra1 archiver DEBUG    1349313: Returned JSON from archiver: {u'retry': u'PT60S', u'succeeded': False, u'error': u'400: Invalid JSON returned'}
Mar 15 16:39:23 netperf01-fra1 archiver WARNING  1349313: Failed to archive https://localhost/pscheduler/tasks/52ee426a-3b0d-41df-9b85-e6782633c65f/runs/3a6cb757-b446-4193-bc9e-54c9d321f9ff to esmond: 400: Invalid JSON returned
Mar 15 16:39:23 netperf01-fra1 archiver DEBUG    1349313: Rescheduling for 2018-03-15 16:40:23.310899+00:00
Mar 15 16:39:23 netperf01-fra1 archiver DEBUG    1349313: Thread finished

On the central management VM, is see the following in /var/log/apache2/access.log pertaining to the host having the issues:

dszydloski@sonar-poc:/var/log/apache2$ tail access.log


10.71.8.28 - - [15/Mar/2018:16:44:32 +0000] "POST /esmond/perfsonar/archive/ HTTP/1.1" 201 7103 "-" "python-requests/2.9.1"
10.71.8.28 - - [15/Mar/2018:16:44:32 +0000] "POST /esmond/perfsonar/archive/ HTTP/1.1" 201 7103 "-" "python-requests/2.9.1"

So its reaching the remote server but not making any PUT requests--likely due to the archiver error I guess?

I've been racking my brain isolating local differences between my mesh node setups but haven't been able to find anything pertaining to this issue. Any ideas on what else I can look at to fix this?

Likely you need to restart cassandra on the archive host. The POST requests that seem to be working only hits postgresql, the PUT that is failing hits cassandra. If that doesn't work or you want to confirm this fact you can check the esmond logs on the measurement archive under /var/log/esmond/. In particular you probably have a long stack trace about having trouble connecting to cassandra.  



2) esmond "fast_mode"?

In the process of troubleshooting 1), I noticed some differences in my different mesh hosts. When I ran "cat /var/log/pscheduler/pscheduler.log | grep 'fast_mode is True'" across the hosts I had some hosts that showed 'fast_mode is True' in the logs but most didn't. It doesn't seem to matter in performance though i'm curious if I should spend time getting these all to report the same way or not.

fast_mode is just a tag in the log messages to indicate if it could find the metadata ID in the local memcached instance. It saves the archiver from having to do that initial POST you’re seeing in the logs which conserves a significant amount of CPU cycles. The metadata id is not cached if there is a storage failure, so probably fixing #1 will flip more of those to True.  





Thanks!
--
David Szydloski
Core Deployment Engineer
VidScale, Inc.



--
David Szydloski
Core Deployment Engineer
VidScale, Inc.



Archive powered by MHonArc 2.6.19.

Top of Page