perfsonar-user - [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly
Subject: perfSONAR User Q&A and Other Discussion
List archive
- From: Casey Russell <>
- To: "" <>
- Subject: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly
- Date: Fri, 13 Oct 2017 12:20:45 -0500
- Ironport-phdr: 9a23:bL8/MhYejKfMveYrglNrOdn/LSx+4OfEezUN459isYplN5qZr8S9bnLW6fgltlLVR4KTs6sC0LuG9fi4EUU7or+5+EgYd5JNUxJXwe43pCcHRPC/NEvgMfTxZDY7FskRHHVs/nW8LFQHUJ2mPw6arXK99yMdFQviPgRpOOv1BpTSj8Oq3Oyu5pHfeQtFiT6+bL9oMBm6sRjau9ULj4dlNqs/0AbCrGFSe+RRy2NoJFaTkAj568yt4pNt8Dletuw4+cJYXqr0Y6o3TbpDDDQ7KG81/9HktQPCTQSU+HQRVHgdnwdSDAjE6BH6WYrxsjf/u+Fg1iSWIdH6QLYpUjmk8qxlSgLniD0fOjE7/mHZisJ+gqFGrhy/uxNy2JTbbJ2POfdkYq/RYdEXSGxcVchRTSxBBYa8YpMRAuUbJuZXsYn8rEYSoxujHgmsH/3gyjtMhnTr2qA1z/4hERzd3Aw7Ad0OtHDUoc72NKgIV+C11rfHzTPZY/NQxzj99JHFfxY8qv+CWrJwdNDeyUgpFw7dilWQqIrlPzCL2esQsmib6fBsWv6oi24isw1xvjauxsYwionVmI0V0ErI+jl+wIYwPdG4S1R0Ydi+EJROsSGWLY12Td0+Q2xupS00yaUGtIalcCUL1JgqxRvSa/KEfoeT/h7uUemcLStkiH9rfb+wmwq+/Emlx+D5SMW53ktGoyxYmdfWrH8NzQbc6s2fR/t94Eih3TGP2hjW6u5eIEA0kbPXK5kgwrIsj5YSvlrPEyHrlEnsg6+WcUIk+ues6+v5eLnpupicN4pshgH/NKQhhNC/DPwmPgUPQ2SW++Gx1LPg8ELiXLlHi/I7nrXFvJ3UIMkUurK1Dg5Q34sm9RqyASqq3MwdnXYdLVJFfByHj5LuO1HLOP33Ee2/g0m3kDdw2f/GOrnhD47OLnfZlrfhZ6hy60hGxAo1099f+4pYCqsdL/LrRk/xqNvYAwchMwOq2ebnBs591oQYWW2VGK+VKb7SsUSW6eI1OOSMYI4VuC3hK/g++fLil345mVkBfaa3x5sXbm63Huh4L0mDf3Xjn8oBQi82uV90VOHwhkaFVzdJImupUrgU5zcnBZigAJuZAI2hnfbJiD+2BJNNYWZPEBWRCnryX4SCR/oWbi+OeIlsniFSBpa7TIp0/hi1uR6y8ad8NefQ/mVMvoj+z8N44+n7lhg07zFyScKQzzfeHClPgmoUSmpuj+hEqktnxwLb3A==
And additional piece of information. searching through old threads, I came across the "pscheduler validate-limits" command. On one of my larger hosts, that command appears to succeed pretty much all the time, but on my smaller hosts, it fails more often than not:
[crussell@ps-washburn-bw ~]$ pscheduler validate-limits
Failed to validate limit: Process took too long to run.
[crussell@ps-washburn-bw ~]$ pscheduler validate-limits
Limit configuration is valid.
[crussell@ps-washburn-bw ~]$ pscheduler validate-limits
Failed to validate limit: Process took too long to run.
[crussell@ps-washburn-bw ~]$ pscheduler validate-limits
Failed to validate limit: Process took too long to run.
[crussell@ps-washburn-bw ~]$ pscheduler validate-limits
Limit configuration is valid.
You can also see that this is a contributor to at least some of these tests not being posted (although I haven't yet captured the reason it fails on the larger host).
I have a fairly long CIDR-LIST in that limits file (the file is identical on the two hosts), does anyone know is that limit processing more likely to be memory intensive or processor intensive? I'd have to look it back up again, but I think I also saw a reference somewhere to a method for referencing an outside list of CIDRs. Does anyone know if that's less intensive than a long CIDR-LIST statement in the limit file?
On Fri, Oct 13, 2017 at 9:55 AM, Casey Russell <> wrote:
Group,I mentioned it some time back, when I thought it was a problem with my 4 lower powered hosts running out of CPU, but I've been chasing it ever since and it's hitting my larger hosts as well. Ever since I upgraded to 4.0 several months ago, I've had an issue where regularly, my hosts stop scheduling tests from the mesh. My dashboard today shows a mess of hosts that failed to schedule tests last night some of them are on their second, (or more) continuous day.I can't figure out if this is a problem with the mesh config file or on the hosts (although since it's spread everywhere, even a newly installed CentOS7 host) I'm leaning toward some problem in the mesh config file.I'm not sure what to give you that will help, so below you'll find some diagnostic commands from an affected host this morning that is only running bandwidth tests, none of the latency tests scheduled.Any ideas or help is appreciated.Since the latency tests were never scheduled, I don't have anything from the API to show you, the mesh config file is at:[root@ps-ksu-bw crussell]# pscheduler schedule2017-10-13T09:47:54-05:00 - 2017-10-13T09:48:23-05:00 (Pending)throughput --duration PT20S --source ps-fhsu-bw.perfsonar.kanren.net --ip-version 4 --dest ps-ksu-bw.perfsonar.kanren.net --parallel 1 (Run with tool 'iperf3')2017-10-13T09:49:33-05:00 - 2017-10-13T09:49:52-05:00 (Pending)throughput --bandwidth 920000000 --duration PT10S --source ps-esu-bw.perfsonar.kanren.net --ip-version 4 --dest ps-ksu-bw.perfsonar.kanren.net --parallel 1 --udp (Run with tool 'iperf3')2017-10-13T09:52:08-05:00 - 2017-10-13T09:52:27-05:00 (Pending)throughput --bandwidth 920000000 --duration PT10S --source ps-bryant-bw.perfsonar.kanren.net --ip-version 6 --dest ps-ksu-bw.perfsonar.kanren.net --parallel 1 --udp (Run with tool 'iperf3')2017-10-13T09:58:44-05:00 - 2017-10-13T09:59:03-05:00 (Pending)throughput --bandwidth 920000000 --duration PT10S --source ps-bryant-bw.perfsonar.kanren.net --ip-version 4 --dest ps-ksu-bw.perfsonar.kanren.net --parallel 1 --udp (Run with tool 'iperf3')2017-10-13T10:07:36-05:00 - 2017-10-13T10:08:05-05:00 (Pending)throughput --duration PT20S --source ps-ku-bw.perfsonar.kanren.net --ip-version 6 --dest ps-ksu-bw.perfsonar.kanren.net --parallel 1 (Run with tool 'iperf3')2017-10-13T10:08:38-05:00 - 2017-10-13T10:08:57-05:00 (Pending)throughput --bandwidth 920000000 --duration PT10S --source ps-ku-bw.perfsonar.kanren.net --ip-version 6 --dest ps-ksu-bw.perfsonar.kanren.net --parallel 1 --udp (Run with tool 'iperf3')2017-10-13T10:10:18-05:00 - 2017-10-13T10:10:47-05:00 (Pending)throughput --duration PT20S --source ps-esu-bw.perfsonar.kanren.net --ip-version 6 --dest ps-ksu-bw.perfsonar.kanren.net --parallel 1 (Run with tool 'iperf3')2017-10-13T10:10:49-05:00 - 2017-10-13T10:11:08-05:00 (Pending)throughput --bandwidth 920000000 --duration PT10S --source ps-esu-bw.perfsonar.kanren.net --ip-version 6 --dest ps-ksu-bw.perfsonar.kanren.net --parallel 1 --udp (Run with tool 'iperf3')2017-10-13T10:16:39-05:00 - 2017-10-13T10:17:08-05:00 (Pending)throughput --duration PT20S --source ps-fhsu-bw.perfsonar.kanren.net --ip-version 6 --dest ps-ksu-bw.perfsonar.kanren.net --parallel 1 (Run with tool 'iperf3')2017-10-13T10:36:46-05:00 - 2017-10-13T10:37:15-05:00 (Pending)throughput --duration PT20S --source ps-esu-bw.perfsonar.kanren.net --ip-version 4 --dest ps-ksu-bw.perfsonar.kanren.net --parallel 1 (Run with tool 'iperf3')[root@ps-ksu-bw crussell]# service pscheduler-runner statusrunner (pid 13073) is running...[root@ps-ksu-bw crussell]# service pscheduler-ticker statusticker (pid 13071) is running...[root@ps-ksu-bw crussell]# service pscheduler-archiver statusarchiver (pid 13078) is running...[root@ps-ksu-bw crussell]# service pscheduler-server statuspscheduler-server: unrecognized service[root@ps-ksu-bw crussell]# service pscheduler-scheduler statusscheduler (pid 13090) is running...[root@ps-ksu-bw crussell]# ps -ax | grep pschedulerWarning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ 3448 pts/0 S+ 0:00 grep pscheduler8236 ? Ss 0:17 postgres: pscheduler pscheduler 127.0.0.1(41520) idle13071 ? Sl 0:42 /usr/bin/python /usr/libexec/pscheduler/daemons/ticker --daemon --pid-file /var/run/pscheduler-ticker.pid --dsn @/etc/pscheduler/database/ database-dsn 13073 ? Sl 21:20 /usr/bin/python /usr/libexec/pscheduler/daemons/runner --daemon --pid-file /var/run/pscheduler-runner.pid --dsn @/etc/pscheduler/database/ database-dsn 13075 ? Ss 1:20 postgres: pscheduler pscheduler 127.0.0.1(48114) idle13076 ? Ss 9:40 postgres: pscheduler pscheduler 127.0.0.1(48116) idle13078 ? S 67:00 /usr/bin/python /usr/libexec/pscheduler/daemons/archiver --daemon --pid-file /var/run/pscheduler-archiver. pid --dsn @/etc/pscheduler/database/ database-dsn 13079 ? Ss 360:11 postgres: pscheduler pscheduler 127.0.0.1(48118) idle13081 ? Ss 8:31 postgres: pscheduler pscheduler 127.0.0.1(48122) idle13083 ? Ss 0:00 postgres: pscheduler pscheduler 127.0.0.1(48126) idle13090 ? Sl 65:19 /usr/bin/python /usr/libexec/pscheduler/daemons/scheduler --daemon --pid-file /var/run/pscheduler-scheduler. pid --dsn @/etc/pscheduler/database/ database-dsn 13108 ? Ss 115:36 postgres: pscheduler pscheduler 127.0.0.1(48132) idle13114 ? Ss 0:00 postgres: pscheduler pscheduler 127.0.0.1(48136) idle28737 ? Ss 0:01 postgres: pscheduler pscheduler 127.0.0.1(55217) idle[root@ps-ksu-bw crussell]#[root@ps-ksu-bw crussell]# service perfsonar-meshconfig-agentusage: /etc/init.d/perfsonar-meshconfig-agent (start|stop|restart|help) start - start perfSONAR MeshConfig Agentstop - stop perfSONAR MeshConfig Agentrestart - restart perfSONAR MeshConfig Agent if running by sending a SIGHUP or start ifnot runningstatus - Indicates if the service is runninghelp - this screen[root@ps-ksu-bw crussell]# service perfsonar-meshconfig-agent restart/etc/init.d/perfsonar-meshconfig-agent stop: perfSONAR MeshConfig Agent stopped waiting.../usr/lib/perfsonar/bin/perfsonar_meshconfig_agent --config=/etc/perfsonar/ meshconfig-agent.conf --pidfile=/var/run/perfsonar- meshconfig-agent.pid --logger=/etc/perfsonar/ meshconfig-agent-logger.conf --user=perfsonar --group=perfsonar /etc/init.d/perfsonar-meshconfig-agent start: perfSONAR MeshConfig Agent started [root@ps-ksu-bw crussell]# tail -n 50 /var/log/perfsonar/meshconfig-agent.log 2017/10/12 20:10:55 (8826) INFO> perfsonar_meshconfig_agent:438 main:: - Added 3 new tasks, and deleted 0 old tasks2017/10/12 21:10:10 (8826) INFO> perfsonar_meshconfig_agent:438 main:: - Added 1 new tasks, and deleted 0 old tasks2017/10/13 03:10:37 (8826) INFO> perfsonar_meshconfig_agent:438 main:: - Added 2 new tasks, and deleted 0 old tasks2017/10/13 04:10:40 (8826) WARN> perfsonar_meshconfig_agent:430 main:: - Problem determining which pscheduler to submit test to for deletion, skipping test throughput/iperf3(ps-ksu-bw.perfsonar.kanren.net->ps-fhsu- bw.perfsonar.kanren.net ): 500 Internal Server Error: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"><html><head><title>500 Internal Server Error</title></head><body><h1>Internal Server Error</h1><p>The server encountered an internal error ormisconfiguration and was unable to completeyour request.</p><p>Please contact the server administrator atroot@localhost to inform them of the time this error occurred,and the actions you performed just before this error.</p><p>More information about this error may be availablein the server error log.</p></body></html>2017/10/13 07:11:39 (8826) INFO> perfsonar_meshconfig_agent:438 main:: - Added 5 new tasks, and deleted 0 old tasks2017/10/13 09:20:23 (8826) INFO> perfsonar_meshconfig_agent:438 main:: - Added 97 new tasks, and deleted 0 old tasks
- [perfsonar-user] meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/13/2017
- [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/13/2017
- [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/13/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Andrew Lake, 10/13/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Larry Blunk, 10/13/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Mark Feit, 10/17/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/17/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/19/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/19/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Mark Feit, 10/20/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/20/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/19/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/17/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Mark Feit, 10/17/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Larry Blunk, 10/13/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Andrew Lake, 10/13/2017
- RE: [perfsonar-user] meshconfig-agent-tasks not scheduling tasks regularly, Garnizov, Ivan (RRZE), 10/16/2017
- Re: [perfsonar-user] meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/16/2017
Archive powered by MHonArc 2.6.19.