Skip to Content.
Sympa Menu

perfsonar-user - RE: [perfsonar-user] Web page hangs, high CPU, Error writing data: 500 Timeout

Subject: perfSONAR User Q&A and Other Discussion

List archive

RE: [perfsonar-user] Web page hangs, high CPU, Error writing data: 500 Timeout


Chronological Thread 
  • From: Darryl K Wohlt <>
  • To: Andrew Lake <>, "" <>
  • Subject: RE: [perfsonar-user] Web page hangs, high CPU, Error writing data: 500 Timeout
  • Date: Wed, 18 May 2016 15:40:55 +0000
  • Accept-language: en-US
  • Authentication-results: es.net; dkim=none (message not signed) header.d=none;es.net; dmarc=none action=none header.from=fnal.gov;
  • Spamdiagnosticmetadata: NSPM
  • Spamdiagnosticoutput: 1:23

Hi Andy,

 

During a previous incident some weeks ago I think host.cgi was prominent, but this time nothing stood out.  I didn’t take note of the memory usage.  Next time this happens I’ll take some snapshots of the ‘top’ display.

 

There was no change after a reboot.

 

All the various logs in /var/log/, /var/log/httpd/, and /var/log/perfsonar/, save one, were like “business as usual”, both at the start of the incident and when it ended.  But in regulartesting.log I found these messages repeating up to the time of recovery:

 

2016/05/17 12:38:55 (9663) ERROR> EsmondBase.pm:53 perfSONAR_PS::RegularTesting::MeasurementArchives::EsmondBase::__ANON__ - Error w

riting metadata (500) 500 Timeout

2016/05/17 12:38:55 (9663) ERROR> MeasurementArchiveChild.pm:209 perfSONAR_PS::RegularTesting::Master::MeasurementArchiveChild::hand

le_results - Problem storing results: Error writing metadata: 500 Timeout

2016/05/17 12:38:55 (9665) ERROR> EsmondBase.pm:65 perfSONAR_PS::RegularTesting::MeasurementArchives::EsmondBase::__ANON__ - Error w

riting data (500) 500 Timeout

2016/05/17 12:38:55 (9665) ERROR> MeasurementArchiveChild.pm:209 perfSONAR_PS::RegularTesting::Master::MeasurementArchiveChild::hand

le_results - Problem storing results: Error writing data: 500 Timeout

2016/05/17 12:38:55 (9663) ERROR> MeasurementArchiveChild.pm:125 perfSONAR_PS::RegularTesting::Master::MeasurementArchiveChild::__AN

ON__ - Problem handling test results: Problem storing results: Error writing metadata: 500 Timeout at /usr/lib/perfsonar/bin/../lib/

perfSONAR_PS/RegularTesting/Master/MeasurementArchiveChild.pm line 122.

2016/05/17 12:38:55 (9665) ERROR> MeasurementArchiveChild.pm:125 perfSONAR_PS::RegularTesting::Master::MeasurementArchiveChild::__AN

ON__ - Problem handling test results: Problem storing results: Error writing data: 500 Timeout at /usr/lib/perfsonar/bin/../lib/perf

SONAR_PS/RegularTesting/Master/MeasurementArchiveChild.pm line 122.

 

2016/05/17 12:38:57 (9668) ERROR> EsmondBase.pm:53 perfSONAR_PS::RegularTesting::MeasurementArchives::EsmondBase::__ANON__ - Error w

riting metadata (500) 500 Timeout

2016/05/17 12:38:57 (9668) ERROR> MeasurementArchiveChild.pm:209 perfSONAR_PS::RegularTesting::Master::MeasurementArchiveChild::hand

le_results - Problem storing results: Error writing metadata: 500 Timeout

2016/05/17 12:38:57 (9668) ERROR> MeasurementArchiveChild.pm:125 perfSONAR_PS::RegularTesting::Master::MeasurementArchiveChild::__AN

ON__ - Problem handling test results: Problem storing results: Error writing metadata: 500 Timeout at /usr/lib/perfsonar/bin/../lib/

perfSONAR_PS/RegularTesting/Master/MeasurementArchiveChild.pm line 122.

 

2016/05/17 12:38:58 (9673) ERROR> EsmondBase.pm:53 perfSONAR_PS::RegularTesting::MeasurementArchives::EsmondBase::__ANON__ - Error w

riting metadata (500) 500 Timeout

2016/05/17 12:38:58 (9673) ERROR> MeasurementArchiveChild.pm:209 perfSONAR_PS::RegularTesting::Master::MeasurementArchiveChild::hand

le_results - Problem storing results: Error writing metadata: 500 Timeout

2016/05/17 12:38:58 (9673) ERROR> MeasurementArchiveChild.pm:125 perfSONAR_PS::RegularTesting::Master::MeasurementArchiveChild::__AN

ON__ - Problem handling test results: Problem storing results: Error writing metadata: 500 Timeout at /usr/lib/perfsonar/bin/../lib/

perfSONAR_PS/RegularTesting/Master/MeasurementArchiveChild.pm line 122.

 

And then these, after the system returned to normal:

 

2016/05/17 12:39:52 (9787) WARN> regulartesting.pl:103 main::__ANON__ - Warned: Use of uninitialized value in concatenation (.) or s

tring at /usr/lib/perfsonar/bin/../lib/perfSONAR_PS/RegularTesting/MeasurementArchives/EsmondTraceroute.pm line 37.

2016/05/17 12:41:52 (10013) WARN> regulartesting.pl:103 main::__ANON__ - Warned: Use of uninitialized value in concatenation (.) or

string at /usr/lib/perfsonar/bin/../lib/perfSONAR_PS/RegularTesting/MeasurementArchives/EsmondTraceroute.pm line 37.

2016/05/17 12:42:13 (10060) WARN> regulartesting.pl:103 main::__ANON__ - Warned: Use of uninitialized value in concatenation (.) or

string at /usr/lib/perfsonar/bin/../lib/perfSONAR_PS/RegularTesting/MeasurementArchives/EsmondTraceroute.pm line 37.

2016/05/17 12:42:23 (10077) WARN> regulartesting.pl:103 main::__ANON__ - Warned: Use of uninitialized value in concatenation (.) or

string at /usr/lib/perfsonar/bin/../lib/perfSONAR_PS/RegularTesting/MeasurementArchives/EsmondTraceroute.pm line 37.

 

I imagine that these warnings are irrelevant, the point is that the timeout messages ceased.

 

There’s admittedly not much to go on here.  Can you offer some tips on what to look for, the next time this happens?

 

Thanks!

 

Darryl K. Wohlt

Network Architect I

 

CCD/NCS/Network Services

Fermi National Accelerator Laboratory

P.O. Box 500, MS 368

Batavia, Illinois 60510

USA

 

630 840 2901 office

630 945 5687  mobile

www.fnal.gov

 

From: Andrew Lake [mailto:]
Sent: Wednesday, May 18, 2016 8:58 AM
To: ; Darryl K Wohlt <>
Subject: Re: [perfsonar-user] Web page hangs, high CPU, Error writing data: 500 Timeout

 

Hi Darryl,

 

A few questions: Is there a particular process claiming most of the CPU? What’s memory usage like on the host? Does it get better for any length of time after a host reboot?

 

Thanks,

Andy

 

 

On May 17, 2016 at 5:07:33 PM, Darryl K Wohlt () wrote:

Hi Everyone,

 

This has happened before, seemingly only to our OWAMP instances. The symptoms are:

 

·         The toolkit web page never stops “loading”

·         High CPU

·         Regulartesting.log displays “Problem storing results: Error writing metadata: 500 Timeout”

·         No response to Nagios checks from another host

·         But scheduled tests appear to be running

 

The last case began Saturday May 14 at ~22:25, and ended today at 12:39.  I have searched all the logs I can think of and have not found any clues as to why it started, or why it ended.

 

Any guidance would be appreciated.

 

Thanks!

 

Darryl K. Wohlt

Network Architect I

 

CCD/NCS/Network Services

Fermi National Accelerator Laboratory

P.O. Box 500, MS 368

Batavia, Illinois 60510

USA

 

630 840 2901 office

630 945 5687  mobile

www.fnal.gov

 




Archive powered by MHonArc 2.6.16.

Top of Page