perfsonar-user - [perfsonar-user] OWAMP tests not scheduling reliably in mesh
Subject: perfSONAR User Q&A and Other Discussion
List archive
- From: Casey Russell <>
- To:
- Subject: [perfsonar-user] OWAMP tests not scheduling reliably in mesh
- Date: Wed, 21 Mar 2018 14:42:19 -0500
- Ironport-phdr: 9a23:2fIlhxDKHlZaCz6xief7UyQJP3N1i/DPJgcQr6AfoPdwSPT+p8bcNUDSrc9gkEXOFd2Cra4c0KyO6+jJYi8p2d65qncMcZhBBVcuqP49uEgeOvODElDxN/XwbiY3T4xoXV5h+GynYwAOQJ6tL1LdrWev4jEMBx7xKRR6JvjvGo7Vks+7y/2+94fcbglUijexe69+IAmrpgjNq8cahpdvJLwswRXTuHtIfOpWxWJsJV2Nmhv3+9m98p1+/SlOovwt78FPX7n0cKQ+VrxYES8pM3sp683xtBnMVhWA630BWWgLiBVIAgzF7BbnXpfttybxq+Rw1DWGMcDwULs5Xymp4aV2Rx/ykCoJNzA3/mLKhMJukK1WuwiuqwBlzoPOfI2ZKPhzc6XAdt0aX2pBWcNRWjRfD4ymdIsAEeoANvtEoYngvFsOtgWxBQ2oBOjyzTJHmmX23bAh0+Q6Dw7G2AggEskNsHvOqtX1LrkdUeavwKnO0zrDc+pb1DHg44bGdRAhpOuDXbN2ccfJzUkvFgXFjlaOpoP4PjOV0P4BvHSc7+plTe6vl2AmqwBtojiz2MgskJPFiZ4SylDB7Sl5w5w6JduiSEFlZ96oCp1QuD+GN4ZwX8gsQHlotT4kxrAHpZK2fi0HyJokyhHEd/CKdoeF7g7/WOufJDp3mG5peLy6ihu370StxPPwW8+p21hQtCVFiMPDtnUV2hzT9MeHTvx981+k2TmV1gDT7vhIIUcolabHMpIgzaA8m5QNvUjZES/2n0L2jKCSdko64OSn9+PnYrD+qp+dMY97lB3+P7wwlsG+Heg1MA0DX2aY9OunyLHu+EL0TKlWgvA4l6TWrIzWJcoeq6O8HQNY3Jgv5w66Dzi80dQYmXcHLEhCeBKCl4XmJ0vOIO3jDfeknVuslDNryuvFPrL7BJXNNGbMkLH7cbZ79UFc1BI/zcpD6JJMFrEBPPXzV1fptNPGFB85PRe0w+HhCNpnzIMSQH+PArSHP6PIqlKI4uMvI/KQZI8OpjrxMfkl5/jyjXAng18de7em3YcJZHyiAPtpPliZMjLQhYIZHH0EpQ04RfavlUaPSxZSYWq/RaQx+mt9BY67XqnZQYX4q7Wa0TbzJIBNfW1CDhjYGm31bJ6JX/MkayuUOMJn1DoJSe7yGMcayRiyuVqimPJcJe3O93hAuA==
Group,
We've got a large mesh config, and for some time now (months) the owamp tests have not been scheduling reliably. What I mean by that is tonight when the mesh config agent runs on them, somewhere around 30-40% of the latency tests in the mesh will fail to schedule (one way). The same test, in the other direction between those hosts will probably schedule fine. 24 hours later, when it runs again, most of those will re-schedule just fine, but a new 30-40% fail to schedule.
I've been meaning to dig into this for some time, but other priorities have dragged me this way and that. So I just began looking into it again today.
It looks like, in my meshconfig-agent.log file, I see one of these four errors when these failures occur.
2018/03/21 04:21:34 (22276) WARN> perfsonar_meshconfig_agent:430 main:: - Problem adding test throughput(ps-fhsu-bw.perfsonar.kanren.net->ps-ksu-bw.perfsonar.kanren.net), continuing with rest of config: 500 INTERNAL SERVER ERROR: Error while tasking ps-ksu-bw.perfsonar.kanren.net: Unable to post task to ps-ksu-bw.perfsonar.kanren.net: Task already exists. All participants must be on separate systems.
2018/03/21 04:21:35 (22276) WARN> perfsonar_meshconfig_agent:430 main:: - Problem adding test latencybg(ps-fhsu-lt.perfsonar.kanren.net->ps-esu-lt.perfsonar.kanren.net), continuing with rest of config: 500 Internal Server Error: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>500 Internal Server Error</title>
</head><body>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error or
misconfiguration and was unable to complete
your request.</p>
<p>Please contact the server administrator at
root@localhost to inform them of the time this error occurred,
and the actions you performed just before this error.</p>
<p>More information about this error may be available
in the server error log.</p>
</body></html>
2018/03/18 23:16:27 (30529) WARN> perfsonar_meshconfig_agent:430 main:: - Problem adding test throughput(ps-ku-bw.perfsonar.kanren.net->ps-bryant-bw.perfsonar.kanren.net), continuing with rest of config: 500 INTERNAL SERVER ERROR: Error while tasking ps-bryant-bw.perfsonar.kanren.net: Unable to post task to ps-bryant-bw.perfsonar.kanren.net: Task already exists. All participants must be on separate systems.
2018/03/19 23:38:17 (10740) WARN> perfsonar_meshconfig_agent:430 main:: - Problem adding test throughput(ps-ku-bw.perfsonar.kanren.net->ps-bryant-bw.perfsonar.kanren.net), continuing with rest of config: 500 Internal Server Error: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
The errors all seem very different, with the exception that they all seem to have the "430 main" in common, and all of them involve the generic "500 Internal server error" that seems to indicate problems pulling from or posting to the web server or API during the schedule. Since the tests never get scheduled the API isn't helping me much in determining what's going on. Any suggestions on where to look next? or what's going on here?
- [perfsonar-user] OWAMP tests not scheduling reliably in mesh, Casey Russell, 03/21/2018
- Re: [perfsonar-user] OWAMP tests not scheduling reliably in mesh, Mark Feit, 03/22/2018
- Re: [perfsonar-user] OWAMP tests not scheduling reliably in mesh, Casey Russell, 03/23/2018
- Re: [perfsonar-user] OWAMP tests not scheduling reliably in mesh, Mark Feit, 03/23/2018
- Re: [perfsonar-user] OWAMP tests not scheduling reliably in mesh, Dale W. Carder, 03/23/2018
- Re: [perfsonar-user] OWAMP tests not scheduling reliably in mesh, Mark Feit, 03/23/2018
- Re: [perfsonar-user] OWAMP tests not scheduling reliably in mesh, Dale W. Carder, 03/23/2018
- Re: [perfsonar-user] OWAMP tests not scheduling reliably in mesh, Mark Feit, 03/23/2018
- Re: [perfsonar-user] OWAMP tests not scheduling reliably in mesh, Casey Russell, 03/23/2018
- Re: [perfsonar-user] OWAMP tests not scheduling reliably in mesh, Mark Feit, 03/22/2018
Archive powered by MHonArc 2.6.19.