perfsonar-user - Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly
Subject: perfSONAR User Q&A and Other Discussion
List archive
- From: Casey Russell <>
- To: Mark Feit <>
- Cc: Larry Blunk <>, "" <>
- Subject: Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly
- Date: Thu, 19 Oct 2017 13:04:37 -0500
- Ironport-phdr: 9a23:RwkmDh0h9SZJpff3smDT+DRfVm0co7zxezQtwd8ZseMTLPad9pjvdHbS+e9qxAeQG96Ku7Qc06L/iOPJYSQ4+5GPsXQPItRndiQuroEopTEmG9OPEkbhLfTnPGQQFcVGU0J5rTngaRAGUMnxaEfPrXKs8DUcBgvwNRZvJuTyB4Xek9m72/q89pDXYAhEniaxba9vJxiqsAvdsdUbj5F/Iagr0BvJpXVIe+VSxWx2IF+Yggjx6MSt8pN96ipco/0u+dJOXqX8ZKQ4UKdXDC86PGAv5c3krgfMQA2S7XYBSGoWkx5IAw/Y7BHmW5r6ryX3uvZh1CScIMb7Vq4/Vyi84Kh3SR/okCYHOCA/8GHLkcx7kaZXrAu8qxBj34LYZYeYP+d8cKzAZ9MXXWhOXshRWSJPAY2ycpUBAPYaMOlCs4XwvUEDoQeiCQSuAu7k1z9GhmXx3a0/y+ksDxvJ3Bc+ENMOrX/asMj1NLwJUe+ryKnIySjIYvRK1jfn9YjIaQshru+IXbJ0asfRylMgFwXfjlqOr4zpJTWV2foRs2WC6edrSOyhi2kiqw5rozivwN8hio3IhoITyVDL6zh2wIczJdGiVEF7ZtukHINMtyCUNot2Xt8iTH9yuCog1rIGvpu7cDAXxJkmyBPTceGLf5WG7x75WuudPy10iG9qdb+/nxq+7Emtx+LmWsWpzVpHoDBJn9fSunwX1xHe6NCLSvhn8UenwzaAyQbT5vtCIU8qiarWLYMqzL0olpcLr0jOHzP6lUfzga+YdUgr4fSk5uHob7n6upOROYp5hR3iPqkvn8GyBPo0PwYBUmWd5O+yzqfs/VfjT7VPlvA2krfWsJTdJckDo662GQ5V0oI65xa4DjeqzM0UnHYZIF9Ldx+LlYfpO1bJIPD3CfewnU6gnytsx/DDJrHhA5PNIWbfkLr5Y7pw6lJQxQg2wNBR5J9bEa0NLff8V0PtqNDVABo0PBCoz+vmDdhw050SVGyAD6OBNaPdq16I5uYhI+mWY48VvS7wJOQ/5/7zln85gkESfa2t3ZYMa3C3BPJmI1+HbnrqmNsBF3kFvhYmQOzwlFKCSSJTZ2q1X68k/jE0FpimDYnYRoCqhryOxj27EoRLZmBdFF+MC2zld4GFW/cXdCKSOdFtniYFVbinV48uywuutAnkxLp7MObY4DMXuo/+1Is92+qGsBgo9nRaAsWBmzWIQWx1gksJQSM7xqZyvRY7x1ueh/tWmftdQPda/P5YGj0nLoXRy+gyX9voRxnac9ONYFWvRM+rB3c3Q85nkIxGWFp0B9j31kOL5CGtGbJA0uXTXJE=
Mark and Larry,One of my hosts (ps-ku-bw) has failed to schedule tasks today. This is one of my larger hosts and the MaxClients problem might have actually been the trigger that began the avalanche. I've left the host broken in case Mark or one of the other developers wants information from it while it's in this failed state.At 9:47am yesterday, the httpd error log showed the following:[root@ps-ku-bw crussell]# tail -f /var/log/httpd/error_log[Wed Oct 18 06:22:07 2017] [warn] [client 139.162.108.53] incomplete redirection target of '/toolkit/' for URI '/' modified to 'http://164.113.32.57/toolkit/' [Wed Oct 18 06:44:26 2017] [warn] [client 141.212.122.81] incomplete redirection target of '/toolkit/' for URI '/' modified to 'http://164.113.32.57/toolkit/' [Wed Oct 18 08:41:56 2017] [warn] [client 54.174.92.112] incomplete redirection target of '/toolkit/' for URI '/' modified to 'http://ps-ku-bw.perfsonar.kanren.net/toolkit/ '[Wed Oct 18 08:44:26 2017] [warn] [client 107.170.201.175] incomplete redirection target of '/toolkit/' for URI '/' modified to 'http://164.113.32.145/toolkit/ '[Wed Oct 18 08:46:51 2017] [warn] [client 107.170.201.175] incomplete redirection target of '/toolkit/' for URI '/' modified to 'http://164.113.32.57/toolkit/' [Wed Oct 18 09:11:21 2017] [error] [client 66.249.66.139] File does not exist: /var/www/html/robots.txt[Wed Oct 18 09:28:02 2017] [error] [client 46.229.164.99] File does not exist: /var/www/html/robots.txt[Wed Oct 18 09:32:52 2017] [warn] [client 155.94.88.58] incomplete redirection target of '/toolkit/' for URI '/' modified to 'http://ps-ku-bw.perfsonar.kanren.net/toolkit/ '[Wed Oct 18 09:41:38 2017] [warn] [client 155.94.88.58] incomplete redirection target of '/toolkit/' for URI '/' modified to 'http://ps-ku-bw.perfsonar.kanren.net/toolkit/ '[Wed Oct 18 09:47:29 2017] [error] server reached MaxClients setting, consider raising the MaxClients settingSince then, nothing has logged in the httpd access log:[root@ps-ku-bw crussell]# tail -f /var/log/httpd/access_log::1 - - [18/Oct/2017:09:47:02 -0500] "PUT /esmond/perfsonar/archive/9bc084ac0a8349ec9b2e94488ca627 16/ HTTP/1.1" 201 2 "-" "python-requests/2.6.0 CPython/2.6.6 Linux/2.6.32-696.10.3.el6.x86_ 64" ::1 - - [18/Oct/2017:09:47:03 -0500] "PUT /esmond/perfsonar/archive/ccc9b3f6b47b4ec5b63337c182dd2f 97/ HTTP/1.1" 201 2 "-" "python-requests/2.6.0 CPython/2.6.6 Linux/2.6.32-696.10.3.el6.x86_ 64" ::1 - - [18/Oct/2017:09:47:04 -0500] "PUT /esmond/perfsonar/archive/347c05205385475d988d6e66350109 6e/ HTTP/1.1" 201 2 "-" "python-requests/2.6.0 CPython/2.6.6 Linux/2.6.32-696.10.3.el6.x86_ 64" ::1 - - [18/Oct/2017:09:47:04 -0500] "PUT /esmond/perfsonar/archive/347c05205385475d988d6e66350109 6e/ HTTP/1.1" 409 101 "-" "python-requests/2.6.0 CPython/2.6.6 Linux/2.6.32-696.10.3.el6.x86_ 64" ::1 - - [18/Oct/2017:09:47:05 -0500] "PUT /esmond/perfsonar/archive/317550e6dae940bcb028c715220ec3 6c/ HTTP/1.1" 201 2 "-" "python-requests/2.6.0 CPython/2.6.6 Linux/2.6.32-696.10.3.el6.x86_ 64" ::1 - - [18/Oct/2017:09:47:07 -0500] "PUT /esmond/perfsonar/archive/54534a09177a472c8c2880c8932210 0e/ HTTP/1.1" 201 2 "-" "python-requests/2.6.0 CPython/2.6.6 Linux/2.6.32-696.10.3.el6.x86_ 64" ::1 - - [18/Oct/2017:09:47:08 -0500] "PUT /esmond/perfsonar/archive/693d654508ba4f209728da0de249fd a6/ HTTP/1.1" 201 2 "-" "python-requests/2.6.0 CPython/2.6.6 Linux/2.6.32-696.10.3.el6.x86_ 64" ::1 - - [18/Oct/2017:09:47:16 -0500] "PUT /esmond/perfsonar/archive/aad5e250b92044a9be0582a1890aca fb/ HTTP/1.1" 201 2 "-" "python-requests/2.6.0 CPython/2.6.6 Linux/2.6.32-696.10.3.el6.x86_ 64" ::1 - - [18/Oct/2017:09:47:31 -0500] "PUT /esmond/perfsonar/archive/0e594a3f088a422b9c2f253954e6be 5a/ HTTP/1.1" 201 2 "-" "python-requests/2.6.0 CPython/2.6.6 Linux/2.6.32-696.10.3.el6.x86_ 64" ::1 - - [18/Oct/2017:09:47:36 -0500] "PUT /esmond/perfsonar/archive/2024f2829908405e9353db034ec54c 2d/ HTTP/1.1" 201 2 "-" "python-requests/2.6.0 CPython/2.6.6 Linux/2.6.32-696.10.3.el6.x86_ 64" The Pscheduler log shows that the tests that WERE scheduled are running and able to log (I believe) to the Central archive, but not locally.Oct 19 12:57:19 ps-ku-bw archiver WARNING 17050500: Failed to archive https://localhost/pscheduler/tasks/e67b5ce2-dcfc-47e0-9989- to esmond: 400: Invalid JSON returnedc94b4fa52481/runs/61e2a10e- 2915-4cff-9bee-de5a16756baa Oct 19 12:57:19 ps-ku-bw runner INFO 8622312: Posted result to https://ps-ku-lt.perfsonar.kanren.net/pscheduler/tasks/ 1eec6a77-dceb-4d8b-9e97- d63bc3dee0c2/runs/b8d33697- 348c-45b4-a94c-2792522f419e Oct 19 12:57:20 ps-ku-bw runner INFO 8620513: Posted result to https://ps-ku-lt.perfsonar.kanren.net/pscheduler/tasks/ 04686c00-09e5-4e98-84e3- 371b6bded7bf/runs/25b87839- c52d-47a8-88ba-484fa5e85c43 Oct 19 12:57:20 ps-ku-bw archiver WARNING 17050432: Failed to archive https://localhost/pscheduler/tasks/51f1cb7d-bd30-472f-bd51- to esmond: Archiver permanently abandoned registering test after 2 attempt(s): 400: Invalid JSON returned5eb0b0ae2350/runs/75ce65dd- 0606-49fc-8dcb-23f381429815 Oct 19 12:57:20 ps-ku-bw archiver WARNING 17050432: Gave up archiving https://localhost/pscheduler/tasks/51f1cb7d-bd30-472f-bd51- to esmond5eb0b0ae2350/runs/75ce65dd- 0606-49fc-8dcb-23f381429815 Oct 19 12:57:21 ps-ku-bw archiver WARNING 17050502: Failed to archive https://localhost/pscheduler/tasks/0bea838f-9368-4250-a893- to esmond: 400: Invalid JSON returned3d61a4bdc7cd/runs/6d4417a8- 3f18-470d-860c-9b6bf23e9dc0 Oct 19 12:57:21 ps-ku-bw runner INFO 8622349: Posted result to https://ps-ku-lt.perfsonar.kanren.net/pscheduler/tasks/ 5ddf82a5-9349-4f7a-8cfd- 2e5b49177418/runs/4a93702b- 71a2-4c64-99e1-e0864004551b I say "I believe" they're logging to the central MA, but not locally, because, as you may note, the API is unavailable if you try querying it for one of those URLs to see what happened. (problem with the HTTPD daemon?).[root@ps-ku-bw crussell]# service httpd statushttpd (pid 16970) is running...[root@ps-ku-bw crussell]# service pscheduler-scheduler statusscheduler (pid 17017) is running...[root@ps-ku-bw crussell]# service pscheduler-archiver statusarchiver (pid 17004) is running...[root@ps-ku-bw crussell]# service pscheduler-ticker statusticker (pid 16999) is running...[root@ps-ku-bw crussell]# service cassandra statuscassandra (pid 1810) is running...I'll leave the host alone for a few hours in case anyone wants me to gather more info.
On Tue, Oct 17, 2017 at 2:44 PM, Casey Russell <> wrote:Mark and Larry,I have seen this occasionally on my lower powered hosts, (and maybe on others, although I've been watching these lower powered hosts much closer, so I'm much more likely to have noticed it there.I don't have a host where that error is active, but here you can see where one of my hosts encountered the error yesterday (this was just before I installed the 4.0.2 beta on it and rebooted it).[root@ps-washburn-bw crussell]# cat /var/log/httpd/error_log | grep Max[Mon Oct 16 16:50:42 2017] [error] server reached MaxClients setting, consider raising the MaxClients setting[root@ps-washburn-bw crussell]#You can see that (today at least) these hosts do have a lot of connections open (sparing you the detailed output, although it's available if you want it). Although that in an of its self is not necessarily a problem[root@ps-washburn-bw crussell]# netstat -tan | wc -l1238(that's 1238 active TCP connections)(Another of my lower powered hosts)[crussell@ps-esu-bw ~]$ netstat -tan | wc -l1903Out of curiosity, I checked to see how many of those were hitting Apache on tcp port 80:[root@ps-washburn-bw crussell]# netstat -an | grep ':80' | wc -l78[crussell@ps-esu-bw ~]$ netstat -tan | grep ':80' | wc -l66It doesn't seem too out of whack, but today may be entirely non-representative of what it looks like when the "MaxClients" problem was occurring. Again, I installed the 4.0.2 beta on both of these hosts yesterday and the problem hasn't recurred since, so today's netstat results may not reflect what it looks like on an affected host.
On Tue, Oct 17, 2017 at 12:32 PM, Mark Feit <> wrote:Larry Blunk writes:
Has anyone experienced Apache hitting the MaxClients limit and hanging? We've had this happen on several boxes
since upgrading to 4.01. We've had to restart Apache to get them functioning again. We've upped the
MaxClients limit on them, but it still occurs even after doubling the setting to 512. These are high perfomance
boxes, so it doesn't seem like it should be a CPU issue.
There shouldn’t be a lot of connections to the HTTP server during normal operations; MeshConfig and task setup from remote nodes are the only things that should be connecting. The internal parts of pScheduler poke the database directly.
If you encounter that again, I’d be interested to see what the process table and netstat say about what’s connected and from where and if there are old processes that connect and aren’t dying.
--Mark
- [perfsonar-user] meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/13/2017
- [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/13/2017
- [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/13/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Andrew Lake, 10/13/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Larry Blunk, 10/13/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Mark Feit, 10/17/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/17/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/19/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/19/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Mark Feit, 10/20/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/20/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/19/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/17/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Mark Feit, 10/17/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Larry Blunk, 10/13/2017
- Re: [perfsonar-user] Re: meshconfig-agent-tasks not scheduling tasks regularly, Andrew Lake, 10/13/2017
- RE: [perfsonar-user] meshconfig-agent-tasks not scheduling tasks regularly, Garnizov, Ivan (RRZE), 10/16/2017
- Re: [perfsonar-user] meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/16/2017
- RE: [perfsonar-user] meshconfig-agent-tasks not scheduling tasks regularly, Garnizov, Ivan (RRZE), 10/16/2017
- Re: [perfsonar-user] meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/16/2017
- RE: [perfsonar-user] meshconfig-agent-tasks not scheduling tasks regularly, Garnizov, Ivan (RRZE), 10/17/2017
- Re: [perfsonar-user] meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/17/2017
- RE: [perfsonar-user] meshconfig-agent-tasks not scheduling tasks regularly, Garnizov, Ivan (RRZE), 10/17/2017
- Re: [perfsonar-user] meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/16/2017
- RE: [perfsonar-user] meshconfig-agent-tasks not scheduling tasks regularly, Garnizov, Ivan (RRZE), 10/16/2017
- Re: [perfsonar-user] meshconfig-agent-tasks not scheduling tasks regularly, Casey Russell, 10/16/2017
Archive powered by MHonArc 2.6.19.