wg-multicast - Problems with SDR Session Advertisements
Subject: All things related to multicast
List archive
- From: Matthew Davy <>
- To:
- Subject: Problems with SDR Session Advertisements
- Date: Wed, 2 Feb 2000 22:26:40 +0000
In the weekly Abilene operations last week, we discussed a problem with SDR
session advertisements that I've been troubleshooting for a couple of weeks
now. This started out as an investigation by IU into why we weren't seeing
the
number of SDR sessions we thought we should be. We're pretty confident now
that the problem is more widespread and therefore want to open the discussion
to a larger group.
What follows is a review of how SDR announcements should be propogated and a
detailed description of what I think is causing problems.
When a source sends it's first SAP packet, the DR sends a register to the RP.
This causes the RP to create (s,g) state and mark it with the "A" flag which
makes the (s,g) eligible to be advertised via MSDP. Assuming there's
nothing
filtering it's advertisment, the RP encapsulates the SAP packet in a MSDP SA
packet and passes it to all it's MSDP peers. This SA with encapsulated data
gets passed across the global MSDP mesh.
As each RP receives the MSDP SA, it sends an (s,g) join towards the source to
get connected to the SPT. It creates an (s,g) entry and populates the oif
list
with the oif list from it's (*,g). It takes the SAP packet from the MSDP SA
packet and sends it out the interfaces in the (s,g) oif list. It also passes
the MSDP SA to it's MSDP peers as appropriate. (Not necessarily in that
order)
For the first SAP packet from a given source, this process appears to be
working
correctly.
For a source that's advertising a small number of sessions, the interval
between SAP packets is about 5-7 minutes depending on the total number of
sessions. The (s,g) state timeout is 210 seconds. So, in most cases, the
(s,g) state will timeout on all routers before the next SAP packet is sent.
If this is the case, additional SAP packets from a source should follow the
same steps above and everything should work fine.
However, there appears to be 1 or more routers out there that continue to send
(s,g) joins long after the (s,g) state should have timed out. In some cases
I've seen these joins up to 4 days after a source has stopped sending packets.
After the mroute state times out on all the routers and before the source
sends
the next SAP packet, 1 or more routers send an (s,g) join for some unknown
reason. This causes all routers between the routers sending the joins and
the
source to create (s,g) state. This sets up a branch of the SPT to each of
the routers sending the joins, but no one else.
When the source sends it's next SAP packet, the DR already has an (s,g) entry
based on the (s,g) join it received and therefore it does not send a register
to the RP. Note: Cisco has a new version of IOS (12.0(8.6)S1) to correct
this
(CSCdp68820). Without a register packet, the RP doesn't know to set the "A"
flag so it doesn't send an MSDP SA. Without MSDP SAs, know one will know
about
the source or receive encapsulated SAP data. So as long as these routers
continue to send the (s,g) joins, no one else will receive SAP packets from
that source.
So how do we fix this ??
One fix is for everyone to deploy the new code (12.0(8.6)S1) on their DR
routers which prevents these joins from stopping register and MSDP SAs
packets.
This should be done anyway, but may take a while.
Another fix which may be quicker is to track down the routers sending the
spurious (s,g) joins and stop them. We've identified the entry points into
Abilene and are working with these peers to track down the source of the
problem.
Thanks !
- Matt
-----------------------------------------------------------------------------
Matthew Davy
812-855-7728
Indiana University Information Technology Abilene Network Operations
-----------------------------------------------------------------------------
- Problems with SDR Session Advertisements, Matthew Davy, 02/02/2000
Archive powered by MHonArc 2.6.16.