perfsonar-user - RE: [perfsonar-user] Perfsonar node showing 100% CPU by scheduler
Subject: perfSONAR User Q&A and Other Discussion
List archive
- From: "Pennington, Mike" <>
- To: Mark Feit <>, "" <>
- Subject: RE: [perfsonar-user] Perfsonar node showing 100% CPU by scheduler
- Date: Wed, 13 Feb 2019 17:35:43 +0000
From: Mark Feit [mailto:]
Pennington, Mike writes:
Also saw this in the pscheduler.log:
Feb 13 10:37:32 perfsonar-hartford journal: runner ERROR 84864837: Failed to post run for result: Database connection pool exhausted. Unable to get connection after 60 attempts. Feb 13 10:37:32 perfsonar-hartford journal: runner ERROR 84828777: Failed to post run for result: Database connection pool exhausted. Unable to get connection after 60 attempts.
You have two things happening, which I suspect are both related to having more workload than the machine can handle.
The runner problem came up last month, and I discussed some of the under-the-hood implications here: https://lists.internet2.edu/sympa/arc/perfsonar-user/2019-01/msg00013.html.
I can’t say what’s happening with the scheduler other than “it’s very busy,” which isn’t particularly helpful. The scheduler takes its to-do list from the tasks in the database and isn’t prone to doing excess work, so this might be something as simple as one or more tasks is configured with a short-enough repeat interval that the work is all legitimate. If this system is part of a mesh, have there been any changes to its configuration, such as having an artificially-low repeat interval for some of the tasks or a switch from MeshConfig to pSConfig format? (Andy Lake is working a bug in the latter that might be related.)
If I could ask you to do a couple of things to give me some insight into the problem: As root, run “pscheduler debug on scheduler,” Wait 30 seconds and then run “pscheduler debug off.” Then grep the string ‘ scheduler ‘ (with spaces on either side) out of /var/log/pscheduler/pscheduler.log and send me the results off-list. If the file is more than 5-10 MB, just send the last few thousand lines. If the machine is reachable from the outside, please also send me its FQDN and I’ll take a look at what it’s up to through the API.
--Mark
|
- [perfsonar-user] Perfsonar node showing 100% CPU by scheduler, Pennington, Mike, 02/13/2019
- <Possible follow-up(s)>
- RE: [perfsonar-user] Perfsonar node showing 100% CPU by scheduler, Pennington, Mike, 02/13/2019
- Re: [perfsonar-user] Perfsonar node showing 100% CPU by scheduler, Mark Feit, 02/13/2019
- RE: [perfsonar-user] Perfsonar node showing 100% CPU by scheduler, Pennington, Mike, 02/13/2019
- Re: [perfsonar-user] Perfsonar node showing 100% CPU by scheduler, William Abbott, 02/13/2019
- Re: [perfsonar-user] Perfsonar node showing 100% CPU by scheduler, Mark Feit, 02/13/2019
- Re: [perfsonar-user] Perfsonar node showing 100% CPU by scheduler, William Abbott, 02/14/2019
- Re: [perfsonar-user] Perfsonar node showing 100% CPU by scheduler, Mark Feit, 02/13/2019
- Re: [perfsonar-user] Perfsonar node showing 100% CPU by scheduler, William Abbott, 02/13/2019
- RE: [perfsonar-user] Perfsonar node showing 100% CPU by scheduler, Pennington, Mike, 02/13/2019
Archive powered by MHonArc 2.6.19.