Skip to Content.
Sympa Menu

perfsonar-user - [perfsonar-user] Modest sized grid has agent failure to archive once or twice a day

Subject: perfSONAR User Q&A and Other Discussion

List archive

[perfsonar-user] Modest sized grid has agent failure to archive once or twice a day


Chronological Thread 
  • From: "Garnizov, Ivan" <>
  • To: Phil Reese <>, "" <>
  • Subject: [perfsonar-user] Modest sized grid has agent failure to archive once or twice a day
  • Date: Tue, 29 Oct 2019 15:08:12 +0000

Hi Phil,

 

Yes, we should be on the right direction, especially if the rate of the “a full slate of workers” message has disappeared.

Still having only 2 attempts for archival too small. You are still quite easily/quickly dropping the measurement results. I would suggest to have attempts within 1 day with 2 attempts with interval of 1-2h in addition to the ones you have.

 

Once you reduce the rate of the “full slate of workers” failure, you should also be able to spot more easily another failure, which should be the real cause of the problem. Obviously there is more to it apart of the exhaustion of pScheduler archiver workers. It might be the case not all of the attempts fail, but still there are.

Perhaps there is an exhaustion / overload  on your Esmond server, if the failure is a timeout.

 

 

Regards,

Ivan Garnizov

 

GEANT WP6T3: pS development team

GEANT WP7T1: pS deployments GN Operations

GEANT WP9T2: Software governance in GEANT

 

 

 

Von: Phil Reese [mailto:]
Gesendet: Montag, 28. Oktober 2019 20:32
An: Garnizov, Ivan (RRZE) <>;
Betreff: Re: AW: [perfsonar-user] Modest sized grid has agent failure to archive once or twice a day

 

Hi Ivan,

I think this is in the correct direction but it hasn't solved the issue yet.

I first changed the retry policy to a much simpler:
{ "attempts": 2,  "wait": "PT120S" }

Still had archiver problems.

Then I removed the 'retry-policy' all together.

Still had archiver problems.

I've now totally removed the second archive destination altogether, waiting for results now.

Note, I've been focused on the MaDDash part of the project, so I didn't pay too much attention to my colleague who wanted the PS data in order to graph it with Grafana.  Together we looked at the perfSONAR docs for archiver options.  The RabbitMQ section (http://docs.perfsonar.net/pscheduler_ref_archivers.html)  offered the stanza we used, including the retry-policy, which does seem too aggressive.


Phil

 


From: Garnizov, Ivan <>
Sent: Friday, October 25, 2019 6:28 AM
To: Phil Reese <>; <>
Subject: AW: AW: [perfsonar-user] Modest sized grid has agent failure to archive once or twice a day

 

Hello Phil,

 

Thanks for the info.

It appears your mesh configuration for the archival of data is causing you troubles.

      "archiver_data": {

        "retry-policy": [

          {

            "attempts": 5,

            "wait": "PT1S"

          },

          {

            "attempts": 5,

            "wait": "PT3S"

          }

        ],

 

 




Archive powered by MHonArc 2.6.19.

Top of Page