Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] Perfsonar node showing 100% CPU by scheduler

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] Perfsonar node showing 100% CPU by scheduler


Chronological Thread 
  • From: Mark Feit <>
  • To: "Pennington, Mike" <>, "" <>
  • Subject: Re: [perfsonar-user] Perfsonar node showing 100% CPU by scheduler
  • Date: Wed, 13 Feb 2019 17:16:56 +0000

Pennington, Mike writes:

 

Also saw this in the pscheduler.log:

 

Feb 13 10:37:32 perfsonar-hartford journal: runner ERROR    84864837: Failed to post run for result: Database connection pool exhausted. Unable to get connection after 60 attempts.

Feb 13 10:37:32 perfsonar-hartford journal: runner ERROR    84828777: Failed to post run for result: Database connection pool exhausted. Unable to get connection after 60 attempts.

 

You have two things happening, which I suspect are both related to having more workload than the machine can handle.

 

The runner problem came up last month, and I discussed some of the under-the-hood implications here:  https://lists.internet2.edu/sympa/arc/perfsonar-user/2019-01/msg00013.html.

 

I can’t say what’s happening with the scheduler other than “it’s very busy,” which isn’t particularly helpful.  The scheduler takes its to-do list from the tasks in the database and isn’t prone to doing excess work, so this might be something as simple as one or more tasks is configured with a short-enough repeat interval that the work is all legitimate.   If this system is part of a mesh, have there been any changes to its configuration, such as having an artificially-low repeat interval for some of the tasks or a switch from MeshConfig to pSConfig format?  (Andy Lake is working a bug in the latter that might be related.)

 

If I could ask you to do a couple of things to give me some insight into the problem:  As root, run “pscheduler debug on scheduler,”  Wait 30 seconds and then run “pscheduler debug off.”  Then grep the string ‘ scheduler ‘ (with spaces on either side) out of /var/log/pscheduler/pscheduler.log and send me the results off-list.  If the file is more than 5-10 MB, just send the last few thousand lines.  If the machine is reachable from the outside, please also send me its FQDN and I’ll take a look at what it’s up to through the API.

 

--Mark

 

 




Archive powered by MHonArc 2.6.19.

Top of Page