Hi Andy,
During a previous incident some weeks ago I think host.cgi was prominent, but this time nothing stood out. I didn’t take note of the memory usage. Next time this
happens I’ll take some snapshots of the ‘top’ display.
There was no change after a reboot.
All the various logs in /var/log/, /var/log/httpd/, and /var/log/perfsonar/, save one, were like “business as usual”, both at the start of the incident and when
it ended. But in regulartesting.log I found these messages repeating up to the time of recovery:
2016/05/17 12:38:55 (9663) ERROR> EsmondBase.pm:53 perfSONAR_PS::RegularTesting::MeasurementArchives::EsmondBase::__ANON__ - Error w
riting metadata (500) 500 Timeout
2016/05/17 12:38:55 (9663) ERROR> MeasurementArchiveChild.pm:209 perfSONAR_PS::RegularTesting::Master::MeasurementArchiveChild::hand
le_results - Problem storing results: Error writing metadata: 500 Timeout
2016/05/17 12:38:55 (9665) ERROR> EsmondBase.pm:65 perfSONAR_PS::RegularTesting::MeasurementArchives::EsmondBase::__ANON__ - Error w
riting data (500) 500 Timeout
2016/05/17 12:38:55 (9665) ERROR> MeasurementArchiveChild.pm:209 perfSONAR_PS::RegularTesting::Master::MeasurementArchiveChild::hand
le_results - Problem storing results: Error writing data: 500 Timeout
2016/05/17 12:38:55 (9663) ERROR> MeasurementArchiveChild.pm:125 perfSONAR_PS::RegularTesting::Master::MeasurementArchiveChild::__AN
ON__ - Problem handling test results: Problem storing results: Error writing metadata: 500 Timeout at /usr/lib/perfsonar/bin/../lib/
perfSONAR_PS/RegularTesting/Master/MeasurementArchiveChild.pm line 122.
2016/05/17 12:38:55 (9665) ERROR> MeasurementArchiveChild.pm:125 perfSONAR_PS::RegularTesting::Master::MeasurementArchiveChild::__AN
ON__ - Problem handling test results: Problem storing results: Error writing data: 500 Timeout at /usr/lib/perfsonar/bin/../lib/perf
SONAR_PS/RegularTesting/Master/MeasurementArchiveChild.pm line 122.
2016/05/17 12:38:57 (9668) ERROR> EsmondBase.pm:53 perfSONAR_PS::RegularTesting::MeasurementArchives::EsmondBase::__ANON__ - Error w
riting metadata (500) 500 Timeout
2016/05/17 12:38:57 (9668) ERROR> MeasurementArchiveChild.pm:209 perfSONAR_PS::RegularTesting::Master::MeasurementArchiveChild::hand
le_results - Problem storing results: Error writing metadata: 500 Timeout
2016/05/17 12:38:57 (9668) ERROR> MeasurementArchiveChild.pm:125 perfSONAR_PS::RegularTesting::Master::MeasurementArchiveChild::__AN
ON__ - Problem handling test results: Problem storing results: Error writing metadata: 500 Timeout at /usr/lib/perfsonar/bin/../lib/
perfSONAR_PS/RegularTesting/Master/MeasurementArchiveChild.pm line 122.
2016/05/17 12:38:58 (9673) ERROR> EsmondBase.pm:53 perfSONAR_PS::RegularTesting::MeasurementArchives::EsmondBase::__ANON__ - Error w
riting metadata (500) 500 Timeout
2016/05/17 12:38:58 (9673) ERROR> MeasurementArchiveChild.pm:209 perfSONAR_PS::RegularTesting::Master::MeasurementArchiveChild::hand
le_results - Problem storing results: Error writing metadata: 500 Timeout
2016/05/17 12:38:58 (9673) ERROR> MeasurementArchiveChild.pm:125 perfSONAR_PS::RegularTesting::Master::MeasurementArchiveChild::__AN
ON__ - Problem handling test results: Problem storing results: Error writing metadata: 500 Timeout at /usr/lib/perfsonar/bin/../lib/
perfSONAR_PS/RegularTesting/Master/MeasurementArchiveChild.pm line 122.
And then these, after the system returned to normal:
2016/05/17 12:39:52 (9787) WARN> regulartesting.pl:103 main::__ANON__ - Warned: Use of uninitialized value in concatenation (.) or s
tring at /usr/lib/perfsonar/bin/../lib/perfSONAR_PS/RegularTesting/MeasurementArchives/EsmondTraceroute.pm line 37.
2016/05/17 12:41:52 (10013) WARN> regulartesting.pl:103 main::__ANON__ - Warned: Use of uninitialized value in concatenation (.) or
string at /usr/lib/perfsonar/bin/../lib/perfSONAR_PS/RegularTesting/MeasurementArchives/EsmondTraceroute.pm line 37.
2016/05/17 12:42:13 (10060) WARN> regulartesting.pl:103 main::__ANON__ - Warned: Use of uninitialized value in concatenation (.) or
string at /usr/lib/perfsonar/bin/../lib/perfSONAR_PS/RegularTesting/MeasurementArchives/EsmondTraceroute.pm line 37.
2016/05/17 12:42:23 (10077) WARN> regulartesting.pl:103 main::__ANON__ - Warned: Use of uninitialized value in concatenation (.) or
string at /usr/lib/perfsonar/bin/../lib/perfSONAR_PS/RegularTesting/MeasurementArchives/EsmondTraceroute.pm line 37.
I imagine that these warnings are irrelevant, the point is that the timeout messages ceased.
There’s admittedly not much to go on here. Can you offer some tips on what to look for, the next time this happens?
Thanks!
Darryl K. Wohlt
Network Architect I
CCD/NCS/Network Services
Fermi National Accelerator Laboratory
P.O. Box 500, MS 368
Batavia, Illinois 60510
USA
630 840 2901 office
630 945 5687 mobile
www.fnal.gov
From: Andrew Lake [mailto:]
Sent: Wednesday, May 18, 2016 8:58 AM
To: ; Darryl K Wohlt <>
Subject: Re: [perfsonar-user] Web page hangs, high CPU, Error writing data: 500 Timeout
A few questions: Is there a particular process claiming most of the CPU? What’s memory usage like on the host? Does it get better for any length of time after a host reboot?
On May 17, 2016 at 5:07:33 PM, Darryl K Wohlt () wrote:
Hi Everyone,
This has happened before, seemingly only to our OWAMP instances. The symptoms are:
·
The toolkit web page never stops “loading”
·
High CPU
·
Regulartesting.log displays “Problem storing results: Error writing metadata: 500 Timeout”
·
No response to Nagios checks from another host
·
But scheduled tests appear to be running
The last case began Saturday May 14 at ~22:25, and ended today at 12:39. I have searched all the logs I can think of and have not found any clues as to why it started, or why it ended.
Any guidance would be appreciated.
Thanks!
Darryl K. Wohlt
Network Architect I
CCD/NCS/Network Services
Fermi National Accelerator Laboratory
P.O. Box 500, MS 368
Batavia, Illinois 60510
USA
630 840 2901 office
630 945 5687 mobile
www.fnal.gov