perfsonar-user - [perfsonar-user] RE: regular_testing esmond error events

Subject: perfSONAR User Q&A and Other Discussion

List archive

[perfsonar-user] RE: regular_testing esmond error events

From: Andrew Lake <>
To: "Garnizov, Ivan (RRZE)" <>
Cc: perfsonar-user <>
Subject: [perfsonar-user] RE: regular_testing esmond error events
Date: Thu, 14 Jan 2016 13:11:30 -0500

Hi,

All results are written to a file for each MA being registered to, then read in and removed once registered. I believe it will try until that file is gone. The toolkit cleans these out nightly. This is not easily configurable currently. There is no event. Again, the best current method to detect missing data is to look at your graphs or rely on orange in the dashboard. Not saying its perfect, just that it’s the best way. Also, you can ignore any messages about duplicates in the logs and yes regular_testing is smart enough to know which MA registration failed when it does and will only retry the one that fails.

Hope that helps,

Andy

On January 14, 2016 at 12:56:57 PM, Garnizov, Ivan (RRZE) () wrote:

Hi Andy,

Could you please clarify then when does the regular_testing service give up on its attempts to store the collected measurements?

Is that a configurable parameter?

Is there an event that indicates the service has trashed some results without having them stored?

I also receive a lot of messages from regular_testing about having found duplicated series. Should I expect that these are automatically dumped on the client side?

Is the service intelligent enough to know which Esmond MA has the records and which not? Or is it retrying the data submission on a general basis?

These are all important in order to be able to establish a proper monitoring mechanism for GEANT!

Best regards,

Ivan

From: Andrew Lake [mailto:]
Sent: Donnerstag, 14. Januar 2016 16:30
To: Garnizov, Ivan (RRZE)
Cc: perfsonar-user
Subject: Re: regular_testing esmond error events

Hi Ivan,

When you look at the graphs

On January 12, 2016 at 12:33:50 PM, Garnizov, Ivan (RRZE) () wrote:

Hi Andy,

Out of the blue sky with the GEANT deployment pS 3.4.2, I started receiving error events on in the regular_testing logs.
reg_test.txt
At the bottom of the attached file you will find excerpt from the esmond.log which based on the of the occurrence do not appear to be relevant.

I have restarted both cassandra , postgresql, regular_testing, but the warnings continued. I suspect the errors are about to reappear.
I would like also to assure you that Apache is up and running and I am able to see measurement results stored locally and on Central MA. There might have been some incidental overloads of the interface, but I would not expect to see this frequency of the errors.
In the course of usage of the MP for the last 66 days access to the interface was always available on 5min checks.

The questions:
How do I find which performance results are missing?

When you look at the graphs do you see any missing data? or does MaDDash report any orange? That’s probably the easiest way.

Is there relation between the WARN and ERROR states?

The standard logging practice is WARN means something non-fatal thatc an generally be ignored and ERROR means something is broken. I can’t say that regular_testing is always consistent with these. It should also be noted that regular_testing will try to re-register data later even if it gets an ERROR, so its not an indication that your data din;t get registered.

Am I safe to ignore the error code without missing other substantial information?

Probably. I would not use the logs as an indication that you are missing data. I would look at the graphs or rely on your dashboard to tell you that.

Best regards,
Ivan

[perfsonar-user] regular_testing esmond error events, Garnizov, Ivan (RRZE), 01/12/2016
- [perfsonar-user] Re: regular_testing esmond error events, Andrew Lake, 01/14/2016
  - [perfsonar-user] RE: regular_testing esmond error events, Garnizov, Ivan (RRZE), 01/14/2016
    - [perfsonar-user] RE: regular_testing esmond error events, Andrew Lake, 01/14/2016

List archive

[perfsonar-user] RE: regular_testing esmond error events