Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] perfSONAR nodes are down due to Memory utilization issue

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] perfSONAR nodes are down due to Memory utilization issue


Chronological Thread 
  • From: Antoine Delvaux <>
  • To: Muhammad Tayyab <>
  • Cc: "" <>
  • Subject: Re: [perfsonar-user] perfSONAR nodes are down due to Memory utilization issue
  • Date: Wed, 15 May 2019 10:43:20 +0000

Hi Tayyab,

> Le 15 mai 2019 à 10:05, Muhammad Tayyab <> a
> écrit :
>
> [root@ps-pul-h4309b ~]# psconfig pscheduler-stats
> Agent Last Run Start Time: 2019/05/15 12:02:03
> Agent Last Run End Time: 2019/05/15 12:22:49
> Agent Last Run Process ID (PID): 6935
> Agent Last Run Log GUID: 1B571A4E-76F0-11E9-9DC0-AD1399FCCD0D
> Total tasks managed by agent: 72
> From include files: 72
> /etc/perfsonar/psconfig/pscheduler.d/toolkit-webui.json: 72

So this shows that no remote pSconfig file is loaded but that you still have
locally defined tasks running. The toolkit-webui.json contains all the tests
defined through the local webUI. If, on the other hand, the webUI at
https://your.node.tld/toolkit/auth/admin/tests.cgi shows you an empty list,
there is a problem.

From your email on the ASREN list it seems that the content of this file is
coming from a previously defined mesh, probably from an upgraded 4.0
perfSONAR installation. This part of the file tells it:

"_meta" : {
"psconfig-translation" : {
"source-format" : "mesh-config-tasks-conf",
"time-translated" : "2019-03-21T08:47:15+00:00"
}

It might be that this interferes and somehow confuses perfSONAR.

To forecibly remove this task list, you can delete the
/etc/perfsonar/psconfig/pscheduler.d/toolkit-webui.json file and wait at most
60 seconds. Look at /var/log/perfsonar/psconfig-pscheduler-agent.log and
wait until you first get a message stating "Running agent..." and then, after
a while and some other messages, a message stating "Agent completed running".

It can take some time (multiple minutes, even up to 20 or 30 mins if the host
is heavily loaded like yours) to complete. Once the message is printed, do a
`psconfig pscheduler-stats` again to see if the list of tasks managed by the
agent is now empty.

The load and the running tests should then hopefully start to decrease. If
that doesn't happen, maybe because of load and an unresponsive pScheduler
API, wait 24h and then all tasks coming from the webUI should expire by
themselves.

Let us know how it goes,

Regards,

Antoine.




Archive powered by MHonArc 2.6.19.

Top of Page