Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] OWAMP tests not scheduling reliably in mesh

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] OWAMP tests not scheduling reliably in mesh


Chronological Thread 
  • From: Mark Feit <>
  • To: Casey Russell <>
  • Cc: "" <>
  • Subject: Re: [perfsonar-user] OWAMP tests not scheduling reliably in mesh
  • Date: Fri, 23 Mar 2018 17:35:48 +0000
  • Accept-language: en-US
  • Authentication-results: kanren.net; dkim=none (message not signed) header.d=none;kanren.net; dmarc=none action=none header.from=internet2.edu;
  • Ironport-phdr: 9a23:SGryxxSgCa45NJF/aeNqqq5Au9psv+yvbD5Q0YIujvd0So/mwa6zZxSN2/xhgRfzUJnB7Loc0qyK6/umATRIyK3CmUhKSIZLWR4BhJdetC0bK+nBN3fGKuX3ZTcxBsVIWQwt1Xi6NU9IBJS2PAWK8TW94jEIBxrwKxd+KPjrFY7OlcS30P2594HObwlSizexfb1/IA+qoQnNq8IbnZZsJqEtxxXTv3BGYf5WxWRmJVKSmxbz+MK994N9/ipTpvws6ddOXb31cKokQ7NYCi8mM30u683wqRbDVwqP6WACXWgQjxFFHhLK7BD+Xpf2ryv6qu9w0zSUMMHqUbw5Xymp4rx1QxH0ligIKz858HnWisNuiqJbvAmhrAF7z4LNfY2ZKOZycqbbcNgHR2ROQ9xRWjRBDI2icoUPE+QPM+VWr4b/plsBsRSxCBK2C+/zzz9FnGP60bE43uknDArI3BYgH9ULsHnMq9v6Lr0SUeOox6fI0zrDbu9W0ir65YjNfBAuv/CMXa5rfMrQz0kvCx3Kjk+WqYP7IzOZyP4BvHaG4Op9TO+ijXMspQ92ojiq3Mgsi4/Ji5oWyl/e8yV52pg6JduiRE59f9GkDJ1dvDyZOYtuWs4uXX1ktSQgxrAJuJO3ZjUGxZUpyhLFaPGKfJCE7g//WOuSOzt0mnFodKihixu99UWs0PDwWtSp3FpSoCpKjNrBumwI2hHW6sWIVudx8lmk1DuPzQ/e7v9LLEUwmKfVNpEswbs9mYQOvkvZEC/7nlj9grWMeUU+4Oeo7vzqYrX4qZ+YMI95khnwP7gplMCjH+g0KxUDUW6F9eil073s5lP2TK9Njv0rjqnWq5faJdkdpqGkGQNVypwj6xGjDzi4zNsYgXgHLFVDeB6diIjpJk3OIPT/Dfe4gFSgiitkx/fDPrH5A5XNKGbMkKv5cLpj90JRyhA/wN9e6p5OF70MIfz+VlXyudHXFhM5Nha7w+fjCNVzzIMeXmePD7ecMKPcr1CI5/4vLvKNZI8TpDbyNeIl5/jwgn8lh1MRZ7em0oYKaHygBPRpP12ZYWbwgtcGCWoKphQxTPbkiF2ZVj5TYWy9X7gl6jEmE4KpE53DS5upgLyAxye7AoZWan5cBlCNF3foa5uLW+0KaC2MPs9tjCYIWqa8RI88hlmSs1rRwqFqP6Lu5zYDuJbnnIx+/fDIjhw28RR3BsKH3mfLSWxoyDAmXTgziYV2u0815FqCzeAshvJVFMB75vVVXx08OIKGieF2FoahCUr6Yt6VRQP+EZ2dCjYrQ4d0modWbg==
  • Spamdiagnosticoutput: 1:0

Casey Russell writes:

 

     It may be worth noting, that all of our hosts are dual-nic/dual-stack hosts where there is a single server, with a latency NIC, and a Bandwidth NIC.  They also run IPv4 and IPv6 on those interfaces.  So it's entirely possible there's something about they way we've built the mesh that causes the problem with the clash of the UUID.

 

Possibly.  As I said last night, the only way for this to happen other than a random number collision is for one end to connect back to itself thinking it’s the other end.  That has to come from the mesh having two different hosts that are interfaces on the same system or DNS weirdness.  If it were the former, the errors would be consistent, and we wouldn’t be seeing errors between ps-ku-bw and ps-bryant-bw.

 

That leaves inconsistent host name resolution which, while a long shot, wouldn’t be the strangest thing I’ve seen.  Is there any chance that one of the servers in your DNS infrastructure has incorrect data and is getting picked to answer queries for about 30% of the time?  The TTL on your records looks to be 24 hours, so I could see it coming up with a different set of wrong answers each time the mesh cycles around and a new round of lookups happens.

 

Would you mind sending me your mesh configuration off-list?

 

 

A single host might look like this:

DNS NAME 1 (latency) NIC 1

ps-ku-lt.perfsonar.kanren.net. 85034 IN A       164.113.32.57

ps-ku-lt.perfsonar.kanren.net. 85122 IN AAAA    2001:49d0:23c0:7::57

 

DNS NAME 2 (bandwidth) NIC 2

ps-ku-bw.perfsonar.kanren.net. 300 IN   A       164.113.32.145

ps-ku-bw.perfsonar.kanren.net. 300 IN   AAAA    2001:49d0:23c0:2::18

 

That shouldn’t cause any problems.  All of pScheduler’s interaction is based around URLs, which don’t have a way to specify protocol.  The way getaddrinfo() is configured on dual-stack systems will make the pSchedulers talk amongst themselves with IPv4 even if the tests specify IPv6.  The usual practice in a dual-stack world is the have separately-named hosts (e.g., foo with an A record and foo6 with AAAA) to force one protocol or the other.  Dual-record hosts (foo with A and AAAA records) were intended for sealmless transitioning between single-stack IPv4 and IPv6, so applications can still ask for the same name and the active stack will ask only for one record or the other.  Again, that shouldn’t have any bearing on what you’re seeing, but I thought it was worth mentioning.

 

--Mark

 




Archive powered by MHonArc 2.6.19.

Top of Page