wg-multicast - Juniper MSDP flaps with distant source

Subject: All things related to multicast

List archive

Juniper MSDP flaps with distant source

From: John Kristoff <>
To:
Subject: Juniper MSDP flaps with distant source
Date: Wed, 1 Jun 2005 17:34:46 -0500

Recently we ran into a problem with some of our multicast sources being
inaccessible to receivers relying on seeing MSDP source actives for our
content. If you or your colleagues tune into to our C-SPAN broadcasts
you may have noticed. The symptoms looked like our MSDP source-active
messages were flapping. The net effect of this behavior was that no
one could get content from our senders. The problem has been fixed by
working around a bug identified in implementation of our box acting as
the RP for these groups. I'm surprised no one has run into this before
and so I'll document this as a sanity check with others who may have
configurations similar to our's and partially to publicly document for
posterity sake.

We had (and still have) a Juniper box at our border that was the RP for
global groups (e.g. 224/8 and 233/8). It is also the MSDP peer towards
the multicast-enabled Internet via MREN. Internally we have some Cisco
6509s and sources attached off of them a couple layers deep. In the
simplest diagram, things looked like this:

to MREN
|
.----+----.
| MSDP/RP |
'----+----'
/ \
/ \
.----. .----.
|6509+-+6509|
'+---' '---+'
| |
| |
.+---. .---+.
|6509+-+6509|
'+---' '---+'
\ /
\ /
.-+---+-.
| swt |
'---+---'
|
source

swt is just an edge switch, dual connected to 6509s, which do HSRP for
the subnet/VLAN the source is attached to.

All indications from our sources, DRs, RP and MSDP configuration were
that everything was OK and that no recent changes were causing MSDP to
go up and down. It still appears that is the case, but only after we
configured the top 2 6509s above to be global group anycast RPs did the
problem go away. The mystery is how things ever worked if nothing
changed.

The problem seems to be that JUNOS is improperly checking with PIM to
see if the source is directly attached and if not, it tells MSDP to
get rid of the entry. This is a bogus check that should never done.
Without having the source directly attached this obviously fails and
MSDP entries go away.

One immediate fix was to have other routes by the RP so we implemented
anycast RP for the global groups and did MSDP peering between all the
new anycast RPs. This worked around the problem by having the Juniper
only get source info from other MSDP peers.

Removal of this errant PIM check applied in a daily regression in a few
and in the next publicly available code releases. PR/59939 is assigned
to this problem. Talk to your rep or JTAC for more details.

A related bug is that when you do a:

juniper> show pim rps extensive

You may seen some entries with a timeout always of zero. Apparently
this is cosmetic and will be fixed separately.

We also experienced rpd core dumps with the 'show multicast usage' and
'mtrace' commands in JUNOS during the process of troubleshooting.

I don't have any further specific details about these 3 other bugs at
the moment, but Juniper does have core dumps and I presume will eventually
provide a fix for all those issues as well.

Orthogonal to this problem, but on a related note, Juniper's PIM register
refresh is coded at 60 seconds and Cisco's is apparently 120 seconds.
JUNOS times out register state after 300 seconds. I'm not sure what
Cisco's time out value is. It might be nice to have some knobs to change
those values, but neither vendor seems to allow you to change them
currently.

Finally, this was not meant as a slam against any vendor and hopefully
it doesn't read that way. Rather, this was an attempt to document
operational experience for the benefit of all in the name of
interoperability.

John

Juniper MSDP flaps with distant source, John Kristoff, 06/01/2005

List archive

Juniper MSDP flaps with distant source