perfsonar-user - RE: [perfsonar-user] Perfsonar node showing 100% CPU by scheduler

Subject: perfSONAR User Q&A and Other Discussion

List archive

RE: [perfsonar-user] Perfsonar node showing 100% CPU by scheduler

From: "Pennington, Mike" <>
To: Mark Feit <>, "" <>
Subject: RE: [perfsonar-user] Perfsonar node showing 100% CPU by scheduler
Date: Wed, 13 Feb 2019 17:35:43 +0000

I personally haven’t made any changes to this thing in ages, but it is part of a couple meshes. The Quilt and another one. Let me work on that other stuff and send you the info off list, thanks!

Mike Pennington

CEN | Network Engineer

Hartford CT | 06105-3702

p 860 622 4566

Member Conference - May 10^th 2019 – Register: https://t.co/laqXY47EZl

From: Mark Feit [mailto:]
Sent: Wednesday, February 13, 2019 12:17 PM
To: Pennington, Mike <>;
Subject: Re: [perfsonar-user] Perfsonar node showing 100% CPU by scheduler

Pennington, Mike writes:

Also saw this in the pscheduler.log:

Feb 13 10:37:32 perfsonar-hartford journal: runner ERROR 84864837: Failed to post run for result: Database connection pool exhausted. Unable to get connection after 60 attempts.

Feb 13 10:37:32 perfsonar-hartford journal: runner ERROR 84828777: Failed to post run for result: Database connection pool exhausted. Unable to get connection after 60 attempts.

You have two things happening, which I suspect are both related to having more workload than the machine can handle.

The runner problem came up last month, and I discussed some of the under-the-hood implications here: https://lists.internet2.edu/sympa/arc/perfsonar-user/2019-01/msg00013.html.

I can’t say what’s happening with the scheduler other than “it’s very busy,” which isn’t particularly helpful. The scheduler takes its to-do list from the tasks in the database and isn’t prone to doing excess work, so this might be something as simple as one or more tasks is configured with a short-enough repeat interval that the work is all legitimate. If this system is part of a mesh, have there been any changes to its configuration, such as having an artificially-low repeat interval for some of the tasks or a switch from MeshConfig to pSConfig format? (Andy Lake is working a bug in the latter that might be related.)

If I could ask you to do a couple of things to give me some insight into the problem: As root, run “pscheduler debug on scheduler,” Wait 30 seconds and then run “pscheduler debug off.” Then grep the string ‘ scheduler ‘ (with spaces on either side) out of /var/log/pscheduler/pscheduler.log and send me the results off-list. If the file is more than 5-10 MB, just send the last few thousand lines. If the machine is reachable from the outside, please also send me its FQDN and I’ll take a look at what it’s up to through the API.

--Mark

[perfsonar-user] Perfsonar node showing 100% CPU by scheduler, Pennington, Mike, 02/13/2019
- <Possible follow-up(s)>
- RE: [perfsonar-user] Perfsonar node showing 100% CPU by scheduler, Pennington, Mike, 02/13/2019
- Re: [perfsonar-user] Perfsonar node showing 100% CPU by scheduler, Mark Feit, 02/13/2019
  - RE: [perfsonar-user] Perfsonar node showing 100% CPU by scheduler, Pennington, Mike, 02/13/2019
    - Re: [perfsonar-user] Perfsonar node showing 100% CPU by scheduler, William Abbott, 02/13/2019
      - Re: [perfsonar-user] Perfsonar node showing 100% CPU by scheduler, Mark Feit, 02/13/2019
        
        Re: [perfsonar-user] Perfsonar node showing 100% CPU by scheduler, William Abbott, 02/14/2019

List archive

RE: [perfsonar-user] Perfsonar node showing 100% CPU by scheduler