Skip to Content.
Sympa Menu

netsec-sig - [Security-WG] Re: [NTAC] DNS Serving Stale to the rescue?

Subject: Internet2 Network Security SIG

List archive

[Security-WG] Re: [NTAC] DNS Serving Stale to the rescue?


Chronological Thread 
  • From: Joseph Metzger <>
  • To: Akbar Kara <>
  • Cc: Steven Wallace <>, Bill Owens <>, "" <>, NTAC <>, Kim Milford <>, "" <>
  • Subject: [Security-WG] Re: [NTAC] DNS Serving Stale to the rescue?
  • Date: Fri, 3 Nov 2017 10:58:39 -0500
  • Ironport-phdr: 9a23:hN6VHBFvRGxzFTFwzRmXa51GYnF86YWxBRYc798ds5kLTJ78oMuwAkXT6L1XgUPTWs2DsrQf2rqQ6/iocFdDyK7JiGoFfp1IWk1NouQttCtkPvS4D1bmJuXhdS0wEZcKflZk+3amLRodQ56mNBXdrXKo8DEdBAj0OxZrKeTpAI7SiNm82/yv95HJbQhFgDmwbaluIBmqsA7cqtQYjYx+J6gr1xDHuGFIe+NYxWNpIVKcgRPx7dqu8ZBg7ipdpesv+9ZPXqvmcas4S6dYDCk9PGAu+MLrrxjDQhCR6XYaT24bjwBHAwnB7BH9Q5fxri73vfdz1SWGIcH7S60/VC+85Kl3VhDnlCYHNyY48G7JjMxwkLlbqw+lqxBm3oLYfJ2ZOP94c6jAf90VWHBBU95TWCxPAo2yYYgBAfcfM+lEtITyvUcCoAGkCAWwGO/iyDlFjWL2060g1OQhFBnL0ggnH9IVrnvUtNX1P7oVX++r0KnJzDrDYO1M2Tzg74XIdw4uoe2NXLJ/b8XR01IiGB3ejlmKr4zqIS+V2/0LvmOG4eRgUuevhHQmqwF3ujWvx9kjipXHho4PzVDE7yp5zYAoLtO7UE52ecOoHIVTui2AOYZ6WMwvT3xytCs7ybAKoYC3cDQOxZg9xBPSa+aLf5aV7h/iTuqdPDN1iXZjdbminRi961Kgxff5VsSs0FZFsC5Fkt7Uu3ADyRPe5NKLSv9m8UelwzqP2AfT6v9cLUA1k6rUNYIhz6YtmpYNsknPBCH7lF/5gaOLbEkp++ul5/ziYrr8p5+cM4F0ihv5MqQrgsG/BPo3MhQPXmiU5+u8zqbu8lPiQLlQiP05jLXZvIjAJcsHvq65HxNV0oE75hakETipzMkYnWMGLFJZeBOLloboO17VLfD8DPe/mEiskCx1y/zcP73hBInNIWbZkLfnY7l991BQxBAtwt9C+pIHQo0GddvwUEbr/P/RFBZxZwW3ye/9TtZ0zIo2RXLKBqKQKqWUt1KUsLEBOe6JMa0SozvlK7AA6uL8gnIj0QsYcLW70pYNQH2jWPJrPxPKMjLXnt4dHDJS7UIFR+vwhQjHCGYLag==

The high level goal of serving stale data is to hide a failure in one component from
the rest of the system. It turns an obvious hard failure into a soft hidden failure for
the applications and endpoints. This makes sense from a service continuity perspective.
But I wonder about the implicit & explicit assumptions being broken, and the
implications of breaking those assumptions. 

Has anybody done a detailed risk analysis of the premise that serving stale DNS data
is better than serving no data?  Does it depend on who the data is being served to
(ie accounts payable system vs medical records systems vs student dorms) or what
DNS data is being served (A records vs SRV vs others).

Overall, I think this will be a good idea in some environments. But I expect it
introduces unacceptable changes in the risk profiles to others.  How do we figure
out which is which?

--Joe



On Fri, Nov 3, 2017 at 9:48 AM, Akbar Kara <> wrote:
Steve,

Could we not ask TRCPS to carry routes to root server infrastructure? Or is the assumption that campus has lost TRCPS too!

Alternatively, it would be interesting to run quagga on AWS vm (that has a path to commodity) and have quagga vm NAT packets originating from your campus DNS that are destined to external DNS.  Maybe something will break... One could do this test for the cost of a latte 😀

/ak

 +1 214-392-2717LEARN NOC: +1 866-647-8728  |  


On Nov 3, 2017, at 8:56 AM, Steven Wallace <> wrote:

In some of my uses cases, it ensure the resolver continues to have access to the authoritative name server.

For example, caching the TLD entry that points to canvas’s name server (which happens to be hosted in AWS), ensures my resolver is able to refresh its cache for canvas’s domain.

steve


On Nov 2, 2017, at 9:44 PM, Bill Owens <> wrote:

It sounds as though this would solve many problems in your isolated campus scenario, but I wonder about the side effects on providers who load balance by returning different DNS results. It’s not uncommon to see 60-second TTLs in records from cloud providers, sometimes even shorter. I think it is unlikely that the server behind whatever A record BIND decided to stick with would simply go away, but it is possible that server would be overwhelmed by the continuous load from your campus. It might be worth a discussion with your critical cloud providers, if they’re willing to discuss that ‘secret sauce’.

Bill.

On Nov 2, 2017, at 11:00 AM, Steven Wallace <> wrote:

It’s going to get better. BIND 9.12 (currently in beta, GA due out end of year?) supports “serve stale” (see: https://tools.ietf.org/html/draft-tale-dnsop-serve-stale-02). Serving Stale Data to Improve DNS Resiliency - does what you’d expect. If the resolver can’t update a cached entry, it responsed with the its current cached entry. BTW, this would have mitigated the October Dyn outage, which left many in the community without access to Box, PayPal, etc., despite the fact that we had working network paths to these service providers.





Archive powered by MHonArc 2.6.19.

Top of Page