Hi,
All results are written to a file for each MA being registered to, then read in and removed once registered. I believe it will try until that file is gone. The toolkit cleans these out nightly. This is not easily configurable currently. There is no event. Again, the best current method to detect missing data is to look at your graphs or rely on orange in the dashboard. Not saying its perfect, just that it’s the best way. Also, you can ignore any messages about duplicates in the logs and yes regular_testing is smart enough to know which MA registration failed when it does and will only retry the one that fails.
Hope that helps, Andy
On January 14, 2016 at 12:56:57 PM, Garnizov, Ivan (RRZE) () wrote:
Hi Andy,
Could you please clarify then when does the regular_testing service
give up on its attempts to store the collected
measurements?
Is that a configurable parameter?
Is there an event that indicates the service has trashed some
results without having them stored?
I also receive a lot of messages from regular_testing about having
found duplicated series. Should I expect that these are
automatically dumped on the client side?
Is the service intelligent enough to know which Esmond MA has the
records and which not? Or is it retrying the data submission on a
general basis?
These are all important in order to be able to establish a proper
monitoring mechanism for GEANT!
Best regards,
Ivan
From:
Andrew Lake [mailto:]
Sent: Donnerstag, 14. Januar 2016 16:30
To: Garnizov, Ivan (RRZE)
Cc: perfsonar-user
Subject: Re: regular_testing esmond error events
When you look at the graphs
On January 12, 2016 at 12:33:50 PM, Garnizov, Ivan (RRZE) ()
wrote:
Hi Andy,
Out of the blue sky with the GEANT deployment pS 3.4.2, I started
receiving error events on in the regular_testing logs.
reg_test.txt
At the bottom of the attached file you will find excerpt from the
esmond.log which based on the of the occurrence do not appear to be
relevant.
I have restarted both cassandra , postgresql, regular_testing, but
the warnings continued. I suspect the errors are about to
reappear.
I would like also to assure you that Apache is up and running and I
am able to see measurement results stored locally and on Central
MA. There might have been some incidental overloads of the
interface, but I would not expect to see this frequency of the
errors.
In the course of usage of the MP for the last 66 days access to the
interface was always available on 5min checks.
The questions:
How do I find which performance results are missing?
When you look at the graphs do you see any missing data? or does
MaDDash report any orange? That’s probably the easiest
way.
Is there relation between the WARN and ERROR states?
The standard logging practice is WARN means something non-fatal
thatc an generally be ignored and ERROR means something is broken.
I can’t say that regular_testing is always consistent with these.
It should also be noted that regular_testing will try to
re-register data later even if it gets an ERROR, so its not an
indication that your data din;t get registered.
Am I safe to ignore the error code without missing other
substantial information?
Probably. I would not use the logs as an indication that you are
missing data. I would look at the graphs or rely on your dashboard
to tell you that.
|