perfsonar-user - [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly
Subject: perfSONAR User Q&A and Other Discussion
List archive
- From: Casey Russell <>
- To: "" <>
- Subject: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly
- Date: Fri, 13 Oct 2017 11:54:09 -0500
- Ironport-phdr: 9a23:/pbqHxYkASXgbsUn6RI9MV3/LSx+4OfEezUN459isYplN5qZr8S9bnLW6fgltlLVR4KTs6sC0LuG9fi4EUU7or+5+EgYd5JNUxJXwe43pCcHRPC/NEvgMfTxZDY7FskRHHVs/nW8LFQHUJ2mPw6arXK99yMdFQviPgRpOOv1BpTSj8Oq3Oyu5pHfeQtFiT6+bL9oMBm6sRjau9ULj4dlNqs/0AbCrGFSe+RRy2NoJFaTkAj568yt4pNt8Dletuw4+cJYXqr0Y6o3TbpDDDQ7KG81/9HktQPCTQSU+HQRVHgdnwdSDAjE6BH6WYrxsjf/u+Fg1iSWIdH6QLYpUjmk8qxlSgLniD0fOjE7/mHZisJ+gqFGrhy/uxNy2JTbbJ2POfdkYq/RYdEXSGxcVchRTSxBBYa8YpMRAuUbJuZXsYn8rEYSoxujHgmsH/3gyjtMhnTr2qA1z/4hERzd3Aw7Ad0OtHDUoc72NKgIV+C11rfHzTPZY/NQxzj99JHFfxY8qv+CWrJwdNDeyUgpFw7dilWQqIrlPzCL2esQsmib6fBsWv6oi24isw1xvjauxsYwionVmI0V0ErI+jl+wIYwPdG4S1R0Ydi+EJROsSGWLY12Td0+Q2xupS00yaUGtIalcCUL1JgqxRvSa/KEfoeT/h7uUemcLStkiH9rfb+wmwq+/Emlx+D5SMW53lZHoyxYmdfWrH8NzQbc6s2fR/t94Eih3TGP2hjW6u5eIEA0kbPXK5kgwrIsj5YSvlrPEyH2lUnsg6+WcUIk+ues6+v5eLnpupicN4pshgH/NKQhhNC/DPwmPgUPQ2SW++Gx1LPg8ELiXLlHi/I7nrXFvJ3UIMkUurK1Dg5Q34sm9RqyATer3MwdnXYdLVJFfByHj5LuO1HLOP33Ee2/g0m3kDdw2f/GOrnhD47OLnfZlrfhZ6hy60hGxAo1099f+4pYCqsdL/LrRk/xqNvYAwchMwOq2ebnBs591oQYWW2VGK+VKb7SsUSW6eI1OOSMYI4VuC3hK/g++fLil345mVkBfaa3x5sXbm63Huh4L0mDf3Xjn8oBQi82uV90VOHwhkaFVzdJImupUrgU5zcnBZigAJuZAI2hnfbJiD+2BJNNYWZPEBWRCnryX4SCR/oWbi+OeIlsniFSBpa7TIp0/hi1uR6y8ad8NefQ/mVMvoj+z8N44+n7lhg07zFyScKQzzfeHClPgmoUSmpuj+hEqktnxwLb3A==
For anyone who takes a look at this, I've restarted the 3 worst hosts, so that the dashboard will clear up, so you'll see data beginning to show up in our dashboard again, with a large gap. The 3 hosts that were the worst up until this morning were ps-ksu, ps-psu, and ps-washburn.
Dashboard is at: http://ps-dashboard.kanren.net/maddash-webui
On Fri, Oct 13, 2017 at 9:55 AM, Casey Russell <> wrote:
Group,I mentioned it some time back, when I thought it was a problem with my 4 lower powered hosts running out of CPU, but I've been chasing it ever since and it's hitting my larger hosts as well. Ever since I upgraded to 4.0 several months ago, I've had an issue where regularly, my hosts stop scheduling tests from the mesh. My dashboard today shows a mess of hosts that failed to schedule tests last night some of them are on their second, (or more) continuous day.I can't figure out if this is a problem with the mesh config file or on the hosts (although since it's spread everywhere, even a newly installed CentOS7 host) I'm leaning toward some problem in the mesh config file.I'm not sure what to give you that will help, so below you'll find some diagnostic commands from an affected host this morning that is only running bandwidth tests, none of the latency tests scheduled.Any ideas or help is appreciated.Since the latency tests were never scheduled, I don't have anything from the API to show you, the mesh config file is at:[root@ps-ksu-bw crussell]# pscheduler schedule2017-10-13T09:47:54-05:00 - 2017-10-13T09:48:23-05:00 (Pending)throughput --duration PT20S --source ps-fhsu-bw.perfsonar.kanren.net --ip-version 4 --dest ps-ksu-bw.perfsonar.kanren.net --parallel 1 (Run with tool 'iperf3')2017-10-13T09:49:33-05:00 - 2017-10-13T09:49:52-05:00 (Pending)throughput --bandwidth 920000000 --duration PT10S --source ps-esu-bw.perfsonar.kanren.net --ip-version 4 --dest ps-ksu-bw.perfsonar.kanren.net --parallel 1 --udp (Run with tool 'iperf3')2017-10-13T09:52:08-05:00 - 2017-10-13T09:52:27-05:00 (Pending)throughput --bandwidth 920000000 --duration PT10S --source ps-bryant-bw.perfsonar.kanren.net --ip-version 6 --dest ps-ksu-bw.perfsonar.kanren.net --parallel 1 --udp (Run with tool 'iperf3')2017-10-13T09:58:44-05:00 - 2017-10-13T09:59:03-05:00 (Pending)throughput --bandwidth 920000000 --duration PT10S --source ps-bryant-bw.perfsonar.kanren.net --ip-version 4 --dest ps-ksu-bw.perfsonar.kanren.net --parallel 1 --udp (Run with tool 'iperf3')2017-10-13T10:07:36-05:00 - 2017-10-13T10:08:05-05:00 (Pending)throughput --duration PT20S --source ps-ku-bw.perfsonar.kanren.net --ip-version 6 --dest ps-ksu-bw.perfsonar.kanren.net --parallel 1 (Run with tool 'iperf3')2017-10-13T10:08:38-05:00 - 2017-10-13T10:08:57-05:00 (Pending)throughput --bandwidth 920000000 --duration PT10S --source ps-ku-bw.perfsonar.kanren.net --ip-version 6 --dest ps-ksu-bw.perfsonar.kanren.net --parallel 1 --udp (Run with tool 'iperf3')2017-10-13T10:10:18-05:00 - 2017-10-13T10:10:47-05:00 (Pending)throughput --duration PT20S --source ps-esu-bw.perfsonar.kanren.net --ip-version 6 --dest ps-ksu-bw.perfsonar.kanren.net --parallel 1 (Run with tool 'iperf3')2017-10-13T10:10:49-05:00 - 2017-10-13T10:11:08-05:00 (Pending)throughput --bandwidth 920000000 --duration PT10S --source ps-esu-bw.perfsonar.kanren.net --ip-version 6 --dest ps-ksu-bw.perfsonar.kanren.net --parallel 1 --udp (Run with tool 'iperf3')2017-10-13T10:16:39-05:00 - 2017-10-13T10:17:08-05:00 (Pending)throughput --duration PT20S --source ps-fhsu-bw.perfsonar.kanren.net --ip-version 6 --dest ps-ksu-bw.perfsonar.kanren.net --parallel 1 (Run with tool 'iperf3')2017-10-13T10:36:46-05:00 - 2017-10-13T10:37:15-05:00 (Pending)throughput --duration PT20S --source ps-esu-bw.perfsonar.kanren.net --ip-version 4 --dest ps-ksu-bw.perfsonar.kanren.net --parallel 1 (Run with tool 'iperf3')[root@ps-ksu-bw crussell]# service pscheduler-runner statusrunner (pid 13073) is running...[root@ps-ksu-bw crussell]# service pscheduler-ticker statusticker (pid 13071) is running...[root@ps-ksu-bw crussell]# service pscheduler-archiver statusarchiver (pid 13078) is running...[root@ps-ksu-bw crussell]# service pscheduler-server statuspscheduler-server: unrecognized service[root@ps-ksu-bw crussell]# service pscheduler-scheduler statusscheduler (pid 13090) is running...[root@ps-ksu-bw crussell]# ps -ax | grep pschedulerWarning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ 3448 pts/0 S+ 0:00 grep pscheduler8236 ? Ss 0:17 postgres: pscheduler pscheduler 127.0.0.1(41520) idle13071 ? Sl 0:42 /usr/bin/python /usr/libexec/pscheduler/daemons/ticker --daemon --pid-file /var/run/pscheduler-ticker.pid --dsn @/etc/pscheduler/database/ database-dsn 13073 ? Sl 21:20 /usr/bin/python /usr/libexec/pscheduler/daemons/runner --daemon --pid-file /var/run/pscheduler-runner.pid --dsn @/etc/pscheduler/database/ database-dsn 13075 ? Ss 1:20 postgres: pscheduler pscheduler 127.0.0.1(48114) idle13076 ? Ss 9:40 postgres: pscheduler pscheduler 127.0.0.1(48116) idle13078 ? S 67:00 /usr/bin/python /usr/libexec/pscheduler/daemons/archiver --daemon --pid-file /var/run/pscheduler-archiver. pid --dsn @/etc/pscheduler/database/ database-dsn 13079 ? Ss 360:11 postgres: pscheduler pscheduler 127.0.0.1(48118) idle13081 ? Ss 8:31 postgres: pscheduler pscheduler 127.0.0.1(48122) idle13083 ? Ss 0:00 postgres: pscheduler pscheduler 127.0.0.1(48126) idle13090 ? Sl 65:19 /usr/bin/python /usr/libexec/pscheduler/daemons/scheduler --daemon --pid-file /var/run/pscheduler-scheduler. pid --dsn @/etc/pscheduler/database/ database-dsn 13108 ? Ss 115:36 postgres: pscheduler pscheduler 127.0.0.1(48132) idle13114 ? Ss 0:00 postgres: pscheduler pscheduler 127.0.0.1(48136) idle28737 ? Ss 0:01 postgres: pscheduler pscheduler 127.0.0.1(55217) idle[root@ps-ksu-bw crussell]#[root@ps-ksu-bw crussell]# service perfsonar-meshconfig-agentusage: /etc/init.d/perfsonar-meshconfig-agent (start|stop|restart|help) start - start perfSONAR MeshConfig Agentstop - stop perfSONAR MeshConfig Agentrestart - restart perfSONAR MeshConfig Agent if running by sending a SIGHUP or start ifnot runningstatus - Indicates if the service is runninghelp - this screen[root@ps-ksu-bw crussell]# service perfsonar-meshconfig-agent restart/etc/init.d/perfsonar-meshconfig-agent stop: perfSONAR MeshConfig Agent stopped waiting.../usr/lib/perfsonar/bin/perfsonar_meshconfig_agent --config=/etc/perfsonar/ meshconfig-agent.conf --pidfile=/var/run/perfsonar- meshconfig-agent.pid --logger=/etc/perfsonar/ meshconfig-agent-logger.conf --user=perfsonar --group=perfsonar /etc/init.d/perfsonar-meshconfig-agent start: perfSONAR MeshConfig Agent started [root@ps-ksu-bw crussell]# tail -n 50 /var/log/perfsonar/meshconfig-agent.log 2017/10/12 20:10:55 (8826) INFO> perfsonar_meshconfig_agent:438 main:: - Added 3 new tasks, and deleted 0 old tasks2017/10/12 21:10:10 (8826) INFO> perfsonar_meshconfig_agent:438 main:: - Added 1 new tasks, and deleted 0 old tasks2017/10/13 03:10:37 (8826) INFO> perfsonar_meshconfig_agent:438 main:: - Added 2 new tasks, and deleted 0 old tasks2017/10/13 04:10:40 (8826) WARN> perfsonar_meshconfig_agent:430 main:: - Problem determining which pscheduler to submit test to for deletion, skipping test throughput/iperf3(ps-ksu-bw.perfsonar.kanren.net->ps-fhsu- bw.perfsonar.kanren.net ): 500 Internal Server Error: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"><html><head><title>500 Internal Server Error</title></head><body><h1>Internal Server Error</h1><p>The server encountered an internal error ormisconfiguration and was unable to completeyour request.</p><p>Please contact the server administrator atroot@localhost to inform them of the time this error occurred,and the actions you performed just before this error.</p><p>More information about this error may be availablein the server error log.</p></body></html>2017/10/13 07:11:39 (8826) INFO> perfsonar_meshconfig_agent:438 main:: - Added 5 new tasks, and deleted 0 old tasks2017/10/13 09:20:23 (8826) INFO> perfsonar_meshconfig_agent:438 main:: - Added 97 new tasks, and deleted 0 old tasks
- [perfsonar-user] meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/13/2017
- [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/13/2017
- [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/13/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Andrew Lake, 10/13/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Larry Blunk, 10/13/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Mark Feit, 10/17/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/17/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/19/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/19/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Mark Feit, 10/20/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/20/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/19/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/17/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Mark Feit, 10/17/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Larry Blunk, 10/13/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Andrew Lake, 10/13/2017
- RE: [perfsonar-user] meshconfig-agent-tasks not scheduling tasks regularly, Garnizov, Ivan (RRZE), 10/16/2017
Archive powered by MHonArc 2.6.19.