perfsonar-user - RE: [perfsonar-user] OWAMP tests not scheduling reliably in mesh
Subject: perfSONAR User Q&A and Other Discussion
List archive
- From: "Garnizov, Ivan (RRZE)" <>
- To: Casey Russell <>, "" <>
- Subject: RE: [perfsonar-user] OWAMP tests not scheduling reliably in mesh
- Date: Mon, 16 Apr 2018 07:49:20 +0000
- Accept-language: en-GB, de-DE, en-US
- Ironport-phdr: 9a23:eDDiWBTQZbp448iweXU6RtNE7Npsv+yvbD5Q0YIujvd0So/mwa67ZRaFt8tkgFKBZ4jH8fUM07OQ7/i7HzRYqb+681k6OKRWUBEEjchE1ycBO+WiTXPBEfjxciYhF95DXlI2t1uyMExSBdqsLwaK+i764jEdAAjwOhRoLerpBIHSk9631+ev8JHPfglEnjWwba98IRmssQndqtQdjJd/JKo21hbHuGZDdf5MxWNvK1KTnhL86dm18ZV+7SleuO8v+tBZX6nicKs2UbJXDDI9M2Ao/8LrrgXMTRGO5nQHTGoblAdDDhXf4xH7WpfxtTb6tvZ41SKHM8D6Uaw4VDK/5KpwVhTmlDkIOCI48GHPi8x/kqRboA66pxdix4LYeZyZOOZicq/Ye94RWGhPUdtLVyFZAo2ycZYBD/YPM+hboYnypVoOogexCgS3Huzj1jtFi2Xq0aEm0eksFxzN0gw6H9IJtXTZtNv5OqYIXuCvyqnIzDTCb/RI2Tvl9YPGcQ4hruuKXbltdcbf1VUhGADKgFmVp4DuIjSY1uUTs2eB6epsT+2vhHQ9qwF3oziux8AsipfSi4Ib0F/E6D91z5srKtChTkNwfNCqEJxVty6ANot2RNsvQ2VpuCY8yb0Jp4O0fCkMyJg92xHfbPmHf5CU4hPtVeaRPSt4i2x/eL6lmxmy9k2gxvXzVsm1zFZKrjdFncLWun8R0BzT79CLSvhn8UenwzaAyQbT5vtCIU8qiarXM5AhzaQtlpYJrEvMACn7k1j1gq+Obkgo5PSk5/j6brjjp5KROYx5hwTkPagyhsCzHeE1PhUSU2SF+emx0bju8VD2TblSk/E7nLTVvIrHKckYpaO1GQxY3ps55xqhADqr1MwUkWcbIF5ZZR6KionkNl/NLf39F/izn0ignCtuyvzbIrHtHJXAI33Gnbrvf7tw7VNXxhApwtBF/Z1UD6kMIPLtVU/1s9zVFho5MgOtz+b9CdRxz54SVXiVAq+DMaLdr0WE5v8pI+mNfYMapiz9K/w/6P7ogn82h1kdcbO00pQJcHy4H/JmLFuFYXf0nNsNDXsGshc7QeHrhlCPUjteaGyuU64g+j02DZ+qAZvdSo2thbGM2Tq3E5JOamBDEF+MEHPod4ueW/cLbSKfOs5hkiIeWrinUYMh2guhtBX8y7pmL+vU/TMYuInt1Ndv+u3fjww9+iJsA8SHz26NV310nn8PRzIuxKBwv1Z9ylmY0ah/hfxXD8ZT6+pUUgchK5Hc1fd3C9TzWgLaYteJU0iqTs+nATE3UtI+3cUOY0BjFNW+kBzPxTSlDKIIl+/DOJth2avH0mm5Hdtm0HvC0OF1hEM7WdBCMWmOhad57QXVQYjEjxPd3+yyeL4SxynL/X3G0HGDpmlZVhJ9S6PIQSpZa0fL55yt/k7YQaSpD709dxZaxNSqK61WZ8fvgEkcAvrvJYKNTXi2njL6Kh+SwLKea5SuM0QD1SOVSHI+oS1Stz7SNBY3AGGuqnjYDS5pC3rhal6q/eQo+yDzdVM90wzfNx4p7LGy4BNAwKXEE/4=
Good morning Casey, Thank you for your reminder/inquiry. Please note a similar or the same issue is already reported and being reviewed:
https://github.com/perfsonar/mesh-config/issues/113. We are working on finding a solution to get the full automation of measurement scheduling for a mesh. As a workaround you can manually start the missing tasks, but you have to be careful to avoid overlapping in time in the tests. Meaning your manual test should
end with the start of the next scheduled run in 24h of the `latencybg` task. If not you’ll end up with the inconvenience of simultaneously running latency measurements between the affected nodes.
Regards, Ivan Garnizov Jubiläumsjahr 2018 - IT in Bewegung Das RRZE - der IT-Dienstleister der FAU From: [mailto:]
On Behalf Of Casey Russell Group, I'm still chasing this problem scheduling latency tests reliably in my mesh. I'm going to come at it from a slightly different angle today. On the idea that the log messages I included last time may have been leading us astray.
Let's simplify... I have two nodes in my larger mesh, PSU, and FHSU. There should be a constant latency test (IPv4) running from PSU to FHSU. It appears that each day at approx 4:20am and again at approx 10:12am pscheduler schedules this test. [crussell@ps-psu-bw ~]$ grep latencybg /var/log/pscheduler/pscheduler.log | grep fhsu | grep "version 4" Apr 8 04:20:30 ps-psu-bw runner INFO 15244212: With powstream: latencybg --data-ports 8760-9960 --dest
ps-fhsu-lt.perfsonar.kanren.net --packet-padding 0 --source
ps-psu-lt.perfsonar.kanren.net --ip-version 4 --packet-interval 0.01 --duration PT86400S --packet-count 6000 Apr 9 04:20:30 ps-psu-bw runner INFO 15273040: With powstream: latencybg --data-ports 8760-9960 --dest
ps-fhsu-lt.perfsonar.kanren.net --packet-padding 0 --source
ps-psu-lt.perfsonar.kanren.net --ip-version 4 --packet-interval 0.01 --duration PT86400S --packet-count 6000 Apr 9 10:14:02 ps-psu-bw runner INFO 15289470: With powstream: latencybg --data-ports 8760-9960 --dest
ps-psu-lt.perfsonar.kanren.net --packet-padding 0 --flip --source
ps-fhsu-lt.perfsonar.kanren.net --ip-version 4 --packet-interval 0.01 --duration PT86400S --packet-count 6000 Apr 10 04:20:30 ps-psu-bw runner INFO 15312685: With powstream: latencybg --data-ports 8760-9960 --dest
ps-fhsu-lt.perfsonar.kanren.net --packet-padding 0 --source
ps-psu-lt.perfsonar.kanren.net --ip-version 4 --packet-interval 0.01 --duration PT86400S --packet-count 6000 Apr 10 10:12:05 ps-psu-bw runner INFO 15324555: With powstream: latencybg --data-ports 8760-9960 --dest
ps-psu-lt.perfsonar.kanren.net --packet-padding 0 --flip --source
ps-fhsu-lt.perfsonar.kanren.net --ip-version 4 --packet-interval 0.01 --duration PT86400S --packet-count 6000 Apr 11 04:20:30 ps-psu-bw runner INFO 15358604: With powstream: latencybg --data-ports 8760-9960 --dest
ps-fhsu-lt.perfsonar.kanren.net --packet-padding 0 --source
ps-psu-lt.perfsonar.kanren.net --ip-version 4 --packet-interval 0.01 --duration PT86400S --packet-count 6000 Apr 11 10:12:05 ps-psu-bw runner INFO 15369873: With powstream: latencybg --data-ports 8760-9960 --dest
ps-psu-lt.perfsonar.kanren.net --packet-padding 0 --flip --source
ps-fhsu-lt.perfsonar.kanren.net --ip-version 4 --packet-interval 0.01 --duration PT86400S --packet-count 6000 Apr 12 04:20:30 ps-psu-bw runner INFO 15406256: With powstream: latencybg --data-ports 8760-9960 --dest
ps-fhsu-lt.perfsonar.kanren.net --packet-padding 0 --source
ps-psu-lt.perfsonar.kanren.net --ip-version 4 --packet-interval 0.01 --duration PT86400S --packet-count 6000 Apr 12 10:12:05 ps-psu-bw runner INFO 15418015: With powstream: latencybg --data-ports 8760-9960 --dest
ps-psu-lt.perfsonar.kanren.net --packet-padding 0 --flip --source
ps-fhsu-lt.perfsonar.kanren.net --ip-version 4 --packet-interval 0.01 --duration PT86400S --packet-count 6000 Apr 13 10:12:05 ps-psu-bw runner INFO 15463862: With powstream: latencybg --data-ports 8760-9960 --dest
ps-psu-lt.perfsonar.kanren.net --packet-padding 0 --flip --source
ps-fhsu-lt.perfsonar.kanren.net --ip-version 4 --packet-interval 0.01 --duration PT86400S --packet-count 6000 However you'll notice this morning (Apr 13) the test didn't schedule at 4:20am. Let's see what pscheduler was doing around that time. [crussell@ps-psu-bw ~]$ grep 'Apr 13 04:20:30' /var/log/pscheduler/pscheduler.log Apr 13 04:20:30 ps-psu-bw runner INFO 15452182: Running
https://ps-psu-lt.perfsonar.kanren.net/pscheduler/tasks/fd145517-1e77-43e0-a1c1-ec5fe0636d2e/runs/b8ad0d54-f9e9-4525-9cd5-15c2497387e8 Apr 13 04:20:30 ps-psu-bw runner INFO 15452182: With powstream: latencybg --data-ports 8760-9960 --dest
ps-esu-lt.perfsonar.kanren.net --packet-padding 0 --source
ps-psu-lt.perfsonar.kanren.net --ip-version 4 --packet-interval 0.01 --duration PT86400S --packet-count 6000 Apr 13 04:20:30 ps-psu-bw runner INFO 15451312: Running
https://ps-psu-lt.perfsonar.kanren.net/pscheduler/tasks/08a7eeda-a45c-4c5f-b178-40f0f6924fde/runs/f7c9b272-39db-4f38-a3d6-97b53a91d5c5 Apr 13 04:20:30 ps-psu-bw runner INFO 15451312: With powstream: latencybg --data-ports 8760-9960 --dest
ps-esu-lt.perfsonar.kanren.net --packet-padding 0 --source
ps-psu-lt.perfsonar.kanren.net --ip-version 6 --packet-interval 0.01 --duration PT86400S --packet-count 6000 Apr 13 04:20:30 ps-psu-bw runner INFO 15451311: Running
https://ps-psu-lt.perfsonar.kanren.net/pscheduler/tasks/c740987c-3ce7-4c3c-9fb3-4ae0c6227391/runs/c5f22df3-a025-4410-a559-238bc19aa3bc Apr 13 04:20:30 ps-psu-bw runner INFO 15451311: With powstream: latencybg --data-ports 8760-9960 --dest
ps-bryant-lt.perfsonar.kanren.net --packet-padding 0 --source
ps-psu-lt.perfsonar.kanren.net --ip-version 6 --packet-interval 0.01 --duration PT86400S --packet-count 6000 Apr 13 04:20:30 ps-psu-bw runner INFO 15452185: Running
https://ps-psu-lt.perfsonar.kanren.net/pscheduler/tasks/910bdd00-36c9-4fa3-b478-fdd3d332d948/runs/1e950ba7-dec6-4885-9ef1-1ec4228aec3d Apr 13 04:20:30 ps-psu-bw runner INFO 15452185: With powstream: latencybg --data-ports 8760-9960 --dest
ps-washburn-lt.perfsonar.kanren.net --packet-padding 0 --source
ps-psu-lt.perfsonar.kanren.net --ip-version 4 --packet-interval 0.01 --duration PT86400S --packet-count 6000 Apr 13 04:20:30 ps-psu-bw runner INFO 15451309: Running
https://ps-psu-lt.perfsonar.kanren.net/pscheduler/tasks/4f58e8ef-83ac-4451-8813-56a35d4bd52c/runs/d71d2fb9-ee4d-4585-a8b4-f64850c46d4a Apr 13 04:20:30 ps-psu-bw runner INFO 15451309: With powstream: latencybg --data-ports 8760-9960 --dest
ps-washburn-lt.perfsonar.kanren.net --packet-padding 0 --source
ps-psu-lt.perfsonar.kanren.net --ip-version 6 --packet-interval 0.01 --duration PT86400S --packet-count 6000 Apr 13 04:20:30 ps-psu-bw runner INFO 15452178: Running
https://ps-psu-lt.perfsonar.kanren.net/pscheduler/tasks/48e6d3e3-1166-4c20-ae0a-212b3a0a7593/runs/022c9037-4137-4d5d-94f0-5c0710d2222c Apr 13 04:20:30 ps-psu-bw runner INFO 15452178: With powstream: latencybg --data-ports 8760-9960 --dest
ps-bryant-lt.perfsonar.kanren.net --packet-padding 0 --source
ps-psu-lt.perfsonar.kanren.net --ip-version 4 --packet-interval 0.01 --duration PT86400S --packet-count 6000 Apr 13 04:20:30 ps-psu-bw runner INFO 15451239: Running
https://ps-psu-lt.perfsonar.kanren.net/pscheduler/tasks/4493c101-138e-4a24-b8f4-761d621b8d07/runs/fdbaefe3-d964-4898-831a-843a3e8ee2aa Apr 13 04:20:30 ps-psu-bw runner INFO 15451239: With powstream: latencybg --data-ports 8760-9960 --dest
ps-wsu-lt.perfsonar.kanren.net --packet-padding 0 --source
ps-psu-lt.perfsonar.kanren.net --ip-version 6 --packet-interval 0.01 --duration PT86400S --packet-count 6000 Apr 13 04:20:30 ps-psu-bw runner INFO 15451310: Running
https://ps-psu-lt.perfsonar.kanren.net/pscheduler/tasks/5bd916e3-1fe7-4725-b7ac-6f1539713202/runs/47874085-3d3c-4e94-812b-56b7d3c6dc1b Apr 13 04:20:30 ps-psu-bw runner INFO 15451310: With powstream: latencybg --data-ports 8760-9960 --dest
ps-fhsu-lt.perfsonar.kanren.net --packet-padding 0 --source
ps-psu-lt.perfsonar.kanren.net --ip-version 6 --packet-interval 0.01 --duration PT86400S --packet-count 6000 Apr 13 04:20:30 ps-psu-bw runner INFO 15406254: Run succeeded. Apr 13 04:20:30 ps-psu-bw runner INFO 15406253: Run succeeded. Apr 13 04:20:30 ps-psu-bw runner INFO 15406257: Run succeeded. Apr 13 04:20:30 ps-psu-bw runner INFO 15406252: Run succeeded. Apr 13 04:20:30 ps-psu-bw runner INFO 15406255: Run succeeded. Apr 13 04:20:30 ps-psu-bw runner INFO 15405617: Run succeeded. Apr 13 04:20:30 ps-psu-bw runner INFO 15406256: Run succeeded. Apr 13 04:20:30 ps-psu-bw runner INFO 15406260: Run succeeded. Apr 13 04:20:30 ps-psu-bw runner INFO 15405618: Run succeeded. Apr 13 04:20:30 ps-psu-bw runner INFO 15405619: Run succeeded. Apr 13 04:20:30 ps-psu-bw runner INFO 15405622: Run succeeded. Apr 13 04:20:30 ps-psu-bw runner WARNING 15451239: Starting 0:00:00.502363 later than scheduled Apr 13 04:20:30 ps-psu-bw runner WARNING 15451309: Starting 0:00:00.507794 later than scheduled Apr 13 04:20:30 ps-psu-bw runner WARNING 15451310: Starting 0:00:00.515842 later than scheduled It scheduled a number of other tests, it just quietly declined scheduling that one. So let's see what the SSL_ERROR_LOG says. [crussell@ps-psu-bw ~]$ sudo grep 'Apr 13 04:2' /var/log/httpd/ssl_error_log [Fri Apr 13 04:20:39 2018] [warn] [client 2001:49d0:23c0:1008::2] incomplete redirection target of '/toolkit/' for URI '/' modified to 'https://ps-psu-bw.perfsonar.kanren.net/toolkit/' [Fri Apr 13 04:20:39 2018] [warn] [client 2001:49d0:23c0:1008::2] incomplete redirection target of '/toolkit/' for URI '/' modified to 'https://ps-psu-bw.perfsonar.kanren.net/toolkit/' [Fri Apr 13 04:20:39 2018] [warn] [client 2001:49d0:23c0:1008::2] incomplete redirection target of '/toolkit/' for URI '/' modified to 'https://ps-psu-bw.perfsonar.kanren.net/toolkit/' [Fri Apr 13 04:20:39 2018] [warn] [client 2001:49d0:23c0:1008::2] incomplete redirection target of '/toolkit/' for URI '/' modified to 'https://ps-psu-bw.perfsonar.kanren.net/toolkit/' [Fri Apr 13 04:20:39 2018] [warn] [client 2001:49d0:23c0:1008::2] incomplete redirection target of '/toolkit/' for URI '/' modified to 'https://ps-psu-bw.perfsonar.kanren.net/toolkit/' [Fri Apr 13 04:20:39 2018] [warn] [client 2001:49d0:23c0:1008::2] incomplete redirection target of '/toolkit/' for URI '/' modified to 'https://ps-psu-bw.perfsonar.kanren.net/toolkit/' [Fri Apr 13 04:20:39 2018] [warn] [client 2001:49d0:23c0:1008::2] incomplete redirection target of '/toolkit/' for URI '/' modified to 'https://ps-psu-bw.perfsonar.kanren.net/toolkit/' [Fri Apr 13 04:20:39 2018] [warn] [client 2001:49d0:23c0:1008::2] incomplete redirection target of '/toolkit/' for URI '/' modified to 'https://ps-psu-bw.perfsonar.kanren.net/toolkit/' [Fri Apr 13 04:20:39 2018] [warn] [client 2001:49d0:23c0:1003::2] incomplete redirection target of '/toolkit/' for URI '/' modified to 'https://ps-psu-lt.perfsonar.kanren.net/toolkit/' [Fri Apr 13 04:20:39 2018] [warn] [client 2001:49d0:23c0:1003::2] incomplete redirection target of '/toolkit/' for URI '/' modified to 'https://ps-psu-lt.perfsonar.kanren.net/toolkit/' [Fri Apr 13 04:20:39 2018] [warn] [client 2001:49d0:23c0:1003::2] incomplete redirection target of '/toolkit/' for URI '/' modified to 'https://ps-psu-lt.perfsonar.kanren.net/toolkit/' [Fri Apr 13 04:20:39 2018] [warn] [client 2001:49d0:23c0:1003::2] incomplete redirection target of '/toolkit/' for URI '/' modified to 'https://ps-psu-lt.perfsonar.kanren.net/toolkit/' [Fri Apr 13 04:20:39 2018] [warn] [client 2001:49d0:23c0:1003::2] incomplete redirection target of '/toolkit/' for URI '/' modified to 'https://ps-psu-lt.perfsonar.kanren.net/toolkit/' [Fri Apr 13 04:20:39 2018] [warn] [client 2001:49d0:23c0:1008::2] incomplete redirection target of '/toolkit/' for URI '/' modified to 'https://ps-psu-bw.perfsonar.kanren.net/toolkit/' [Fri Apr 13 04:20:39 2018] [warn] [client 2001:49d0:23c0:1008::2] incomplete redirection target of '/toolkit/' for URI '/' modified to 'https://ps-psu-bw.perfsonar.kanren.net/toolkit/' [Fri Apr 13 04:20:39 2018] [warn] [client 2001:49d0:23c0:1008::2] incomplete redirection target of '/toolkit/' for URI '/' modified to 'https://ps-psu-bw.perfsonar.kanren.net/toolkit/' Nothing at all in the first half of that minute. How about the bulk messages getting dumped to /var/log/messages? [crussell@ps-psu-bw ~]$ sudo grep 'Apr 13 04:20:30' /var/log/messages Apr 13 04:20:30 ps-psu-bw runner INFO 15452182: Running
https://ps-psu-lt.perfsonar.kanren.net/pscheduler/tasks/fd145517-1e77-43e0-a1c1-ec5fe0636d2e/runs/b8ad0d54-f9e9-4525-9cd5-15c2497387e8 Apr 13 04:20:30 ps-psu-bw runner INFO 15452182: With powstream: latencybg --data-ports 8760-9960 --dest
ps-esu-lt.perfsonar.kanren.net --packet-padding 0 --source
ps-psu-lt.perfsonar.kanren.net --ip-version 4 --packet-interval 0.01 --duration PT86400S --packet-count 6000 Apr 13 04:20:30 ps-psu-bw runner INFO 15451312: Running
https://ps-psu-lt.perfsonar.kanren.net/pscheduler/tasks/08a7eeda-a45c-4c5f-b178-40f0f6924fde/runs/f7c9b272-39db-4f38-a3d6-97b53a91d5c5 Apr 13 04:20:30 ps-psu-bw runner INFO 15451312: With powstream: latencybg --data-ports 8760-9960 --dest
ps-esu-lt.perfsonar.kanren.net --packet-padding 0 --source
ps-psu-lt.perfsonar.kanren.net --ip-version 6 --packet-interval 0.01 --duration PT86400S --packet-count 6000 Apr 13 04:20:30 ps-psu-bw runner INFO 15451311: Running
https://ps-psu-lt.perfsonar.kanren.net/pscheduler/tasks/c740987c-3ce7-4c3c-9fb3-4ae0c6227391/runs/c5f22df3-a025-4410-a559-238bc19aa3bc Apr 13 04:20:30 ps-psu-bw runner INFO 15451311: With powstream: latencybg --data-ports 8760-9960 --dest
ps-bryant-lt.perfsonar.kanren.net --packet-padding 0 --source
ps-psu-lt.perfsonar.kanren.net --ip-version 6 --packet-interval 0.01 --duration PT86400S --packet-count 6000 Apr 13 04:20:30 ps-psu-bw runner INFO 15452185: Running
https://ps-psu-lt.perfsonar.kanren.net/pscheduler/tasks/910bdd00-36c9-4fa3-b478-fdd3d332d948/runs/1e950ba7-dec6-4885-9ef1-1ec4228aec3d Apr 13 04:20:30 ps-psu-bw runner INFO 15452185: With powstream: latencybg --data-ports 8760-9960 --dest
ps-washburn-lt.perfsonar.kanren.net --packet-padding 0 --source
ps-psu-lt.perfsonar.kanren.net --ip-version 4 --packet-interval 0.01 --duration PT86400S --packet-count 6000 Apr 13 04:20:30 ps-psu-bw runner INFO 15451309: Running
https://ps-psu-lt.perfsonar.kanren.net/pscheduler/tasks/4f58e8ef-83ac-4451-8813-56a35d4bd52c/runs/d71d2fb9-ee4d-4585-a8b4-f64850c46d4a Apr 13 04:20:30 ps-psu-bw runner INFO 15451309: With powstream: latencybg --data-ports 8760-9960 --dest
ps-washburn-lt.perfsonar.kanren.net --packet-padding 0 --source
ps-psu-lt.perfsonar.kanren.net --ip-version 6 --packet-interval 0.01 --duration PT86400S --packet-count 6000 Apr 13 04:20:30 ps-psu-bw runner INFO 15452178: Running
https://ps-psu-lt.perfsonar.kanren.net/pscheduler/tasks/48e6d3e3-1166-4c20-ae0a-212b3a0a7593/runs/022c9037-4137-4d5d-94f0-5c0710d2222c Apr 13 04:20:30 ps-psu-bw runner INFO 15452178: With powstream: latencybg --data-ports 8760-9960 --dest
ps-bryant-lt.perfsonar.kanren.net --packet-padding 0 --source
ps-psu-lt.perfsonar.kanren.net --ip-version 4 --packet-interval 0.01 --duration PT86400S --packet-count 6000 Apr 13 04:20:30 ps-psu-bw runner INFO 15451239: Running
https://ps-psu-lt.perfsonar.kanren.net/pscheduler/tasks/4493c101-138e-4a24-b8f4-761d621b8d07/runs/fdbaefe3-d964-4898-831a-843a3e8ee2aa Apr 13 04:20:30 ps-psu-bw runner INFO 15451239: With powstream: latencybg --data-ports 8760-9960 --dest
ps-wsu-lt.perfsonar.kanren.net --packet-padding 0 --source
ps-psu-lt.perfsonar.kanren.net --ip-version 6 --packet-interval 0.01 --duration PT86400S --packet-count 6000 Apr 13 04:20:30 ps-psu-bw runner INFO 15451310: Running
https://ps-psu-lt.perfsonar.kanren.net/pscheduler/tasks/5bd916e3-1fe7-4725-b7ac-6f1539713202/runs/47874085-3d3c-4e94-812b-56b7d3c6dc1b Apr 13 04:20:30 ps-psu-bw runner INFO 15451310: With powstream: latencybg --data-ports 8760-9960 --dest
ps-fhsu-lt.perfsonar.kanren.net --packet-padding 0 --source
ps-psu-lt.perfsonar.kanren.net --ip-version 6 --packet-interval 0.01 --duration PT86400S --packet-count 6000 Apr 13 04:20:30 ps-psu-bw powstream[27861]: _OWPWriteStopSessions: called in wrong state. Apr 13 04:20:30 ps-psu-bw powstream[27853]: _OWPWriteStopSessions: called in wrong state. Apr 13 04:20:30 ps-psu-bw powstream[27851]: _OWPWriteStopSessions: called in wrong state. Apr 13 04:20:30 ps-psu-bw powstream[28787]: _OWPWriteStopSessions: called in wrong state. Apr 13 04:20:30 ps-psu-bw powstream[9442]: _OWPWriteStopSessions: called in wrong state. Apr 13 04:20:30 ps-psu-bw powstream[27861]: SIGTERM/SIGINT Caught: Exiting. Apr 13 04:20:30 ps-psu-bw powstream[27859]: _OWPWriteStopSessions: called in wrong state. Apr 13 04:20:30 ps-psu-bw powstream[27859]: SIGTERM/SIGINT Caught: Exiting. Apr 13 04:20:30 ps-psu-bw powstream[27857]: _OWPWriteStopSessions: called in wrong state. Apr 13 04:20:30 ps-psu-bw powstream[27853]: SIGTERM/SIGINT Caught: Exiting. Apr 13 04:20:30 ps-psu-bw powstream[27851]: SIGTERM/SIGINT Caught: Exiting. Apr 13 04:20:30 ps-psu-bw powstream[27860]: _OWPWriteStopSessions: called in wrong state. Apr 13 04:20:30 ps-psu-bw powstream[27860]: SIGTERM/SIGINT Caught: Exiting. Apr 13 04:20:30 ps-psu-bw powstream[27844]: _OWPWriteStopSessions: called in wrong state. Apr 13 04:20:30 ps-psu-bw powstream[28787]: SIGTERM/SIGINT Caught: Exiting. Apr 13 04:20:30 ps-psu-bw powstream[9442]: SIGTERM/SIGINT Caught: Exiting. Apr 13 04:20:30 ps-psu-bw powstream[30298]: _OWPWriteStopSessions: called in wrong state. Apr 13 04:20:30 ps-psu-bw powstream[30298]: SIGTERM/SIGINT Caught: Exiting. Apr 13 04:20:30 ps-psu-bw powstream[27857]: SIGTERM/SIGINT Caught: Exiting. Apr 13 04:20:30 ps-psu-bw runner INFO 15406254: Run succeeded. Apr 13 04:20:30 ps-psu-bw runner INFO 15406253: Run succeeded. Apr 13 04:20:30 ps-psu-bw runner INFO 15406257: Run succeeded. Apr 13 04:20:30 ps-psu-bw runner INFO 15406252: Run succeeded. Apr 13 04:20:30 ps-psu-bw runner INFO 15406255: Run succeeded. Apr 13 04:20:30 ps-psu-bw runner INFO 15405617: Run succeeded. Apr 13 04:20:30 ps-psu-bw runner INFO 15406256: Run succeeded. Apr 13 04:20:30 ps-psu-bw runner INFO 15406260: Run succeeded. Apr 13 04:20:30 ps-psu-bw powstream[27844]: SIGTERM/SIGINT Caught: Exiting. Apr 13 04:20:30 ps-psu-bw runner INFO 15405618: Run succeeded. Apr 13 04:20:30 ps-psu-bw runner INFO 15405619: Run succeeded. Apr 13 04:20:30 ps-psu-bw runner INFO 15405622: Run succeeded. Apr 13 04:20:30 ps-psu-bw powstream[27856]: _OWPWriteStopSessions: called in wrong state. Apr 13 04:20:30 ps-psu-bw powstream[27856]: SIGTERM/SIGINT Caught: Exiting. Apr 13 04:20:30 ps-psu-bw runner WARNING 15451239: Starting 0:00:00.502363 later than scheduled Apr 13 04:20:30 ps-psu-bw runner WARNING 15451309: Starting 0:00:00.507794 later than scheduled Apr 13 04:20:30 ps-psu-bw runner WARNING 15451310: Starting 0:00:00.515842 later than scheduled Here we see again the tests that ARE getting scheduled, but I'm not seeing anything about the one that FAILED to schedule. Does anyone have any thoughts on where I should look next? Are the powstream errors here related? should I be
concerned about them?
On Thu, Mar 22, 2018 at 6:53 PM, Mark Feit <> wrote: Casey Russell writes:
We've got a large mesh config, and for some time now (months) the owamp tests have not been scheduling reliably. What I mean by that is tonight when the mesh config agent runs on them, somewhere around 30-40% of the latency tests in the mesh will fail
to schedule (one way). The same test, in the other direction between those hosts will probably schedule fine. 24 hours later, when it runs again, most of those will re-schedule just fine, but a new 30-40% fail to schedule.
…
2018/03/21 04:21:34 (22276) WARN> perfsonar_meshconfig_agent:430 main:: - Problem adding test throughput(ps-fhsu-bw.perfsonar.kanren.net->ps-ksu-bw.perfsonar.kanren.net), continuing with rest
of config: 500 INTERNAL SERVER ERROR: Error while tasking
ps-ksu-bw.perfsonar.kanren.net: Unable to post task to
ps-ksu-bw.perfsonar.kanren.net: Task already exists. All participants must be on separate systems.
2018/03/18 23:16:27 (30529) WARN> perfsonar_meshconfig_agent:430 main:: - Problem adding test throughput(ps-ku-bw.perfsonar.kanren.net->ps-bryant-bw.perfsonar.kanren.net), continuing with
rest of config: 500 INTERNAL SERVER ERROR: Error while tasking
ps-bryant-bw.perfsonar.kanren.net: Unable to post task to
ps-bryant-bw.perfsonar.kanren.net: Task already exists. All participants must be on separate systems. That’s a known error, but not expected under these circumstances. Let me think aloud for a minute: Meshconfig submits the task to the first participant (“A”), which assigns it an identifier. A then submits the task to the second participant (“B”) under the same identifier.
The usual case is that B doesn’t have a task with that identifier and everything goes to plan. If the task already exists on B, it will be rejected by B and, in turn, A with the error you see. There are two things that can cause this to happen: One is tasks
from different systems having the same identifier. The identifiers are version 4 (random) UUIDs. The other is when the task has parameters that put both participants on the same system (e.g., pscheduler task throughput --dest localhost). That makes A and
B the same machine, and when A tries to task itself (as if it were B), it complains that the task is a duplicate. I suspect that the latter is what’s going on here, because happenstance collisions in a 128-bit space should be exceedingly rare. To the best of my knowledge, we’re not seeing
this anywhere else, even at sites running large meshes. This could be a case of something really weird happening network- or DNS-wise, but it’s also possible that Meshconfig is doing something silly like trying to task the wrong system. Andy’s our resident
guru on that subject and us out until next week, but I may have a peek at the Meshconfig sources to see if I can spot anything obvious.
2018/03/21 04:21:35 (22276) WARN> perfsonar_meshconfig_agent:430 main:: - Problem adding test latencybg(ps-fhsu-lt.perfsonar.kanren.net->ps-esu-lt.perfsonar.kanren.net), continuing with rest
of config: 500 Internal Server Error: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
… For this one, you’ll probably find an error with the same timestamp (plus or minus a bit) in the Apache logs. Can I take it that you’re not having trouble with the tasks that do get scheduled? --Mark |
- Re: [perfsonar-user] OWAMP tests not scheduling reliably in mesh, Casey Russell, 04/13/2018
- RE: [perfsonar-user] OWAMP tests not scheduling reliably in mesh, Garnizov, Ivan (RRZE), 04/16/2018
Archive powered by MHonArc 2.6.19.