Skip to Content.
Sympa Menu

wg-multicast - Re: Notes from BOF

Subject: All things related to multicast

List archive

Re: Notes from BOF


Chronological Thread 
  • From: "Jonathan S. Thyer JSTHYER" <>
  • To: John Kristoff <>
  • Cc:
  • Subject: Re: Notes from BOF
  • Date: Tue, 7 Feb 2006 17:35:11 -0500


John,

>High (90+%) cpu on 6509s with sup2's when a multicast app is
>sending with a TTL=1.

I have observed this issue on our campus network.  Do you guys know if Cisco has a bug id on this?

-------
Joff Thyer
Senior Networks Engineer/Architect
IT Networks Department
211C Forney Building, UNCG
PO BOX 26170, Greensboro NC 27402-6170
Phone: (336) 256-TECH



John Kristoff <>

02/07/2006 11:03 AM

To
cc
Subject
Re: Notes from BOF





On Mon, Feb 06, 2006 at 04:29:17PM -0500, Alan Crosswell wrote:
> (John Kristoff  be sending his notes separately.)

As I mentioned to Alan a couple days ago I didn't get a chance to
present anything formally and ended up scribbling a few notes at
the start of the BoF.  I'll include a bunch of details here and a
couple others I didn't mention.

I referred to a paper and a tool.  The paper is "Failure to thrive:
QoS and the culture of operational networking", which you can find
from the ACM RIPQoS workshop.  The reason I like referring to this
paper is because of the very familiar feeling of pain described in
that paper regarding stable multicast operations.  I spoke with
the author after the BoF and he indicated that multicast in their
environment is much more stable now than it was described in that
paper.  However, for me, the situation persists.  The reason we
believe is due large part to frequency of code upgrades and the
implementation of "new" knobs that relate to multicast protocols.
My most recent environment was frequently going through code
upgrades and the use of new knobs, particularly the "hardening"
knobs to help mitigate unnecessary multicast state and flooding.
As I understand it, these types of change at LBNL are far and few
between in recent memory.

The tool I referred to is a very crude Perl script that tries to
summarize some rudimentary multicast state and counters on a
router.  It spits out per interface counts for IGMP joins, IGMP
leaves, in and out multicast octets as well as if MSDP is enabled
and how many SA cache entries there is so.  The idea was to be
able to just get a quick snapshot of some key numbers to help
quickly spot obvious anomalous multicast load/state.  mcastsum
can be found here:

 <http://aharp.ittns.northwestern.edu/software/>

The following is a list of issues we've experienced over the past
year or so, some with varying degree of end user pain.  Generally
all took up a non-trivial amount of support effort and time.  And
except for cases involving our NUTV service, in my estimation, our
local multicast user population is in the single digits.

Note, let me be clear this is not an attempt to pick on a vendor.
In all cases involving bugs, support people I worked with were all
very good.  Bugs happen.

JUNOS bug
 PIM logic bug causing sources not directly attached to flap.
 It was unclear when this started happening, but it surfaced
 about a month or two after the last JUNOS upgrade and we believe
 we didn't have the problem for that long of a period.  We never
 figured out why it started happening and it took awhile to find
 this one.  Took troubleshooting from the Juniper as well as
 the router vendor where the source was attached (Cisco).  This
 one took some time to figure out.  Had to bring up additional
 MSDP peers in front of the Juniper to work around this problem.

JUNOS bug
 'show multicast usage' crashed router, done by JTAC while in the
 process of troubleshooting previous bug.

JUNOS bug
 mtrace command crashed route, done by JTAC while in the process
 of troubleshooting previous bug.

JUNOS bug
 Source specific SA limiter was rejecting SAs from sources not
 actually exceeding the configured limit.

IOS bug
 filter-sa-request doesn't work.

IOS bug
 Not specifically multicast, but related.  If you use certain
 modules, in our case a wireless lan module, multicast packets
 to it get processed using the port mirroring feature.  These
 modules use span sessions starting at #1 and counting up.  We
 had #1 configured and when we removed the commands, the router
 completely locked up, as well, oddly enough, did some of its
 neighbors.

IOS oddity and bug
 Send a TCP ACK to a multicast address the router is listening
 to and you'll get a RST back, with the source address filled
 in with the group address you sent to.

High (90+%) cpu on 6509s with sup2's when a multicast app is
sending with a TTL=1.

MREN not accepting routes from Abilene, typo in a route-map in
BGP peering config.

I had some control plane configs wrong so that an RP and some
PIM interfaces were rejecting valid registers.  Found another
incorrect multicast-related filter that was broken in the process.

ip sap listen on some interfaces and an totally borked control
plane policer config caused OSPF adjacencies to bounce, because
SA floods were starving OSPF traffic in the control plane policer.

generic udp multicast rate limit for an ingress on subnets cause
some file distribution ghost-like apps to completely fail.

When there is a layer 2 topology change, our layer 2 devices flush
their group/port state cache and cause brief multicast outages and
flooding during these periods.

And finally, one last non-operational problem... Multicast Beacon
code upgrades released on a Friday that require us to upgrade by
Monday.  :-)

John




Archive powered by MHonArc 2.6.16.

Top of Page