perfsonar-user - Re: [perfsonar-user] OWAMP tests not scheduling reliably in mesh
Subject: perfSONAR User Q&A and Other Discussion
List archive
- From: Mark Feit <>
- To: Casey Russell <>
- Cc: "" <>
- Subject: Re: [perfsonar-user] OWAMP tests not scheduling reliably in mesh
- Date: Fri, 23 Mar 2018 17:35:48 +0000
- Accept-language: en-US
- Authentication-results: kanren.net; dkim=none (message not signed) header.d=none;kanren.net; dmarc=none action=none header.from=internet2.edu;
- Ironport-phdr: 9a23:SGryxxSgCa45NJF/aeNqqq5Au9psv+yvbD5Q0YIujvd0So/mwa6zZxSN2/xhgRfzUJnB7Loc0qyK6/umATRIyK3CmUhKSIZLWR4BhJdetC0bK+nBN3fGKuX3ZTcxBsVIWQwt1Xi6NU9IBJS2PAWK8TW94jEIBxrwKxd+KPjrFY7OlcS30P2594HObwlSizexfb1/IA+qoQnNq8IbnZZsJqEtxxXTv3BGYf5WxWRmJVKSmxbz+MK994N9/ipTpvws6ddOXb31cKokQ7NYCi8mM30u683wqRbDVwqP6WACXWgQjxFFHhLK7BD+Xpf2ryv6qu9w0zSUMMHqUbw5Xymp4rx1QxH0ligIKz858HnWisNuiqJbvAmhrAF7z4LNfY2ZKOZycqbbcNgHR2ROQ9xRWjRBDI2icoUPE+QPM+VWr4b/plsBsRSxCBK2C+/zzz9FnGP60bE43uknDArI3BYgH9ULsHnMq9v6Lr0SUeOox6fI0zrDbu9W0ir65YjNfBAuv/CMXa5rfMrQz0kvCx3Kjk+WqYP7IzOZyP4BvHaG4Op9TO+ijXMspQ92ojiq3Mgsi4/Ji5oWyl/e8yV52pg6JduiRE59f9GkDJ1dvDyZOYtuWs4uXX1ktSQgxrAJuJO3ZjUGxZUpyhLFaPGKfJCE7g//WOuSOzt0mnFodKihixu99UWs0PDwWtSp3FpSoCpKjNrBumwI2hHW6sWIVudx8lmk1DuPzQ/e7v9LLEUwmKfVNpEswbs9mYQOvkvZEC/7nlj9grWMeUU+4Oeo7vzqYrX4qZ+YMI95khnwP7gplMCjH+g0KxUDUW6F9eil073s5lP2TK9Njv0rjqnWq5faJdkdpqGkGQNVypwj6xGjDzi4zNsYgXgHLFVDeB6diIjpJk3OIPT/Dfe4gFSgiitkx/fDPrH5A5XNKGbMkKv5cLpj90JRyhA/wN9e6p5OF70MIfz+VlXyudHXFhM5Nha7w+fjCNVzzIMeXmePD7ecMKPcr1CI5/4vLvKNZI8TpDbyNeIl5/jwgn8lh1MRZ7em0oYKaHygBPRpP12ZYWbwgtcGCWoKphQxTPbkiF2ZVj5TYWy9X7gl6jEmE4KpE53DS5upgLyAxye7AoZWan5cBlCNF3foa5uLW+0KaC2MPs9tjCYIWqa8RI88hlmSs1rRwqFqP6Lu5zYDuJbnnIx+/fDIjhw28RR3BsKH3mfLSWxoyDAmXTgziYV2u0815FqCzeAshvJVFMB75vVVXx08OIKGieF2FoahCUr6Yt6VRQP+EZ2dCjYrQ4d0modWbg==
- Spamdiagnosticoutput: 1:0
Casey Russell writes: It may be worth noting, that all of our hosts are dual-nic/dual-stack hosts where there is a single server, with a latency NIC, and a Bandwidth NIC. They also run
IPv4 and IPv6 on those interfaces. So it's entirely possible there's something about they way we've built the mesh that causes the problem with the clash of the UUID. Possibly. As I said last night, the only way for this to happen other than a random number collision is for one end to connect back to itself thinking it’s the other end. That has to come from
the mesh having two different hosts that are interfaces on the same system or DNS weirdness. If it were the former, the errors would be consistent, and we wouldn’t be seeing errors between ps-ku-bw and ps-bryant-bw. That leaves inconsistent host name resolution which, while a long shot, wouldn’t be the strangest thing I’ve seen. Is there any chance that one of the servers in your DNS infrastructure has
incorrect data and is getting picked to answer queries for about 30% of the time? The TTL on your records looks to be 24 hours, so I could see it coming up with a different set of wrong answers each time the mesh cycles around and a new round of lookups happens. Would you mind sending me your mesh configuration off-list? A single host might look like this: DNS NAME 1 (latency) NIC 1 ps-ku-lt.perfsonar.kanren.net.
85034 IN A 164.113.32.57 ps-ku-lt.perfsonar.kanren.net.
85122 IN AAAA 2001:49d0:23c0:7::57 DNS NAME 2 (bandwidth) NIC 2 ps-ku-bw.perfsonar.kanren.net.
300 IN A 164.113.32.145 ps-ku-bw.perfsonar.kanren.net.
300 IN AAAA 2001:49d0:23c0:2::18 That shouldn’t cause any problems. All of pScheduler’s interaction is based around URLs, which don’t have a way to specify protocol. The way getaddrinfo() is configured on dual-stack systems
will make the pSchedulers talk amongst themselves with IPv4 even if the tests specify IPv6. The usual practice in a dual-stack world is the have separately-named hosts (e.g., foo with an A record and foo6 with AAAA) to force one protocol or the other. Dual-record
hosts (foo with A and AAAA records) were intended for sealmless transitioning between single-stack IPv4 and IPv6, so applications can still ask for the same name and the active stack will ask only for one record or the other. Again, that shouldn’t have any
bearing on what you’re seeing, but I thought it was worth mentioning. --Mark |
- [perfsonar-user] OWAMP tests not scheduling reliably in mesh, Casey Russell, 03/21/2018
- Re: [perfsonar-user] OWAMP tests not scheduling reliably in mesh, Mark Feit, 03/22/2018
- Re: [perfsonar-user] OWAMP tests not scheduling reliably in mesh, Casey Russell, 03/23/2018
- Re: [perfsonar-user] OWAMP tests not scheduling reliably in mesh, Mark Feit, 03/23/2018
- Re: [perfsonar-user] OWAMP tests not scheduling reliably in mesh, Dale W. Carder, 03/23/2018
- Re: [perfsonar-user] OWAMP tests not scheduling reliably in mesh, Mark Feit, 03/23/2018
- Re: [perfsonar-user] OWAMP tests not scheduling reliably in mesh, Dale W. Carder, 03/23/2018
- Re: [perfsonar-user] OWAMP tests not scheduling reliably in mesh, Mark Feit, 03/23/2018
- Re: [perfsonar-user] OWAMP tests not scheduling reliably in mesh, Casey Russell, 03/23/2018
- Re: [perfsonar-user] OWAMP tests not scheduling reliably in mesh, Mark Feit, 03/22/2018
Archive powered by MHonArc 2.6.19.