Skip to Content.
Sympa Menu

wg-multicast - Re: Serious network problems this AM - multicast related?

Subject: All things related to multicast

List archive

Re: Serious network problems this AM - multicast related?


Chronological Thread 
  • From: David Farmer <>
  • To:
  • Cc: Bill Owens <>, , David Farmer <>
  • Subject: Re: Serious network problems this AM - multicast related?
  • Date: Fri, 04 Jan 2013 08:32:50 -0600
  • Organization: University of Minnesota

On 1/4/13 07:12 , William F. Maton Sotomayor wrote:
On Fri, 4 Jan 2013, Bill Owens wrote:

On Fri, Jan 04, 2013 at 07:17:44AM -0500, William F. Maton Sotomayor
wrote:

We've seen multiple outages between gigapops in Canada towards CANARIE,
apparently due to a routing table leak that triggered some folks'
maximum
prefix settings on BGP peerings. But, maybe, that's a coincidence.

No, I don't think it was a coincidence - based on that lead I looked
at the BGP prefix graphs maintained by one of our member campuses and
I see huge jumps on both their NYSERNet connection and the direct NLR
peering they maintain. I think someone leaked about 328k routes, and
in fact it looks as though it happened twice (or a session reset and
re-established).

Indeed:

206.130.255.13 4 6509 9008064 788188 0 0 0 04:47:00 Idle
(PfxCt)

Knowing CANARIE routes camp at around 14000 +- 1000, 20k seemed alright:

neighbor 206.130.255.13 maximum-prefix 20000

It's extremely disruptive to do this, but minimises multiple
oscillations which can cause the router CPU to thrash about, which is
more destructive on all the other BGP speaking-routers.

We don't have max-prefix set on our large external connections like I2
and NLR, but obviously we're going to have to do that. . .

I used to (since our gigapop is setup like an IX) use one router to do a
hard shut and another just to warn. Now I have both doing a hard shut
as the oscillations can get too rapid to be able to login to the router
and catch the origin.

Interestingly enough, our MSDP peerings to CANARIE were unaffected, so
at least mcast sources were stil being learned that way, not that it
would have helped. :-)

wfms

NLR's leak hit us bad, we have R&E in our global route table and the Commodity Internet in a VRF. NLR leaking 300K+ routes to us and into our R&E table blew our TCAMs out. Weren't set of for 700K+ IPv4 routes, we are not though. Implemented a plan reallocation of TCAM this morning in the heat of battle.

We've set a max prefix on NLR of 20K routes and will be doing the same for I2 and all our other R&E peers in a future maintenance window

Bill, the first leak was at 08:20UTC ish when were the other leak(s) you saw, I'm trying to correlate stuff.

Thanks.

--
================================================
David Farmer Email:

Office of Information Technology
University of Minnesota
2218 University Ave SE Phone: 1-612-626-0815
Minneapolis, MN 55414-3029 Cell: 1-612-812-9952
================================================



Archive powered by MHonArc 2.6.16.

Top of Page