wg-multicast - Re: Notes from BOF

Subject: All things related to multicast

List archive

Re: Notes from BOF

From: Alan Crosswell <>
To: John Kristoff <>
Cc:
Subject: Re: Notes from BOF
Date: Tue, 07 Feb 2006 11:21:09 -0500

John,

Thanks for this. Do you have sample config snippets that you consider
BCP that we could add to the cookbook and/or the workshops?
/a

John Kristoff wrote:
> On Mon, Feb 06, 2006 at 04:29:17PM -0500, Alan Crosswell wrote:
>> (John Kristoff be sending his notes separately.)
>
> As I mentioned to Alan a couple days ago I didn't get a chance to
> present anything formally and ended up scribbling a few notes at
> the start of the BoF. I'll include a bunch of details here and a
> couple others I didn't mention.
>
> I referred to a paper and a tool. The paper is "Failure to thrive:
> QoS and the culture of operational networking", which you can find
> from the ACM RIPQoS workshop. The reason I like referring to this
> paper is because of the very familiar feeling of pain described in
> that paper regarding stable multicast operations. I spoke with
> the author after the BoF and he indicated that multicast in their
> environment is much more stable now than it was described in that
> paper. However, for me, the situation persists. The reason we
> believe is due large part to frequency of code upgrades and the
> implementation of "new" knobs that relate to multicast protocols.
> My most recent environment was frequently going through code
> upgrades and the use of new knobs, particularly the "hardening"
> knobs to help mitigate unnecessary multicast state and flooding.
> As I understand it, these types of change at LBNL are far and few
> between in recent memory.
>
> The tool I referred to is a very crude Perl script that tries to
> summarize some rudimentary multicast state and counters on a
> router. It spits out per interface counts for IGMP joins, IGMP
> leaves, in and out multicast octets as well as if MSDP is enabled
> and how many SA cache entries there is so. The idea was to be
> able to just get a quick snapshot of some key numbers to help
> quickly spot obvious anomalous multicast load/state. mcastsum
> can be found here:
>
> <http://aharp.ittns.northwestern.edu/software/>
>
> The following is a list of issues we've experienced over the past
> year or so, some with varying degree of end user pain. Generally
> all took up a non-trivial amount of support effort and time. And
> except for cases involving our NUTV service, in my estimation, our
> local multicast user population is in the single digits.
>
> Note, let me be clear this is not an attempt to pick on a vendor.
> In all cases involving bugs, support people I worked with were all
> very good. Bugs happen.
>
> JUNOS bug
> PIM logic bug causing sources not directly attached to flap.
> It was unclear when this started happening, but it surfaced
> about a month or two after the last JUNOS upgrade and we believe
> we didn't have the problem for that long of a period. We never
> figured out why it started happening and it took awhile to find
> this one. Took troubleshooting from the Juniper as well as
> the router vendor where the source was attached (Cisco). This
> one took some time to figure out. Had to bring up additional
> MSDP peers in front of the Juniper to work around this problem.
>
> JUNOS bug
> 'show multicast usage' crashed router, done by JTAC while in the
> process of troubleshooting previous bug.
>
> JUNOS bug
> mtrace command crashed route, done by JTAC while in the process
> of troubleshooting previous bug.
>
> JUNOS bug
> Source specific SA limiter was rejecting SAs from sources not
> actually exceeding the configured limit.
>
> IOS bug
> filter-sa-request doesn't work.
>
> IOS bug
> Not specifically multicast, but related. If you use certain
> modules, in our case a wireless lan module, multicast packets
> to it get processed using the port mirroring feature. These
> modules use span sessions starting at #1 and counting up. We
> had #1 configured and when we removed the commands, the router
> completely locked up, as well, oddly enough, did some of its
> neighbors.
>
> IOS oddity and bug
> Send a TCP ACK to a multicast address the router is listening
> to and you'll get a RST back, with the source address filled
> in with the group address you sent to.
>
> High (90+%) cpu on 6509s with sup2's when a multicast app is
> sending with a TTL=1.
>
> MREN not accepting routes from Abilene, typo in a route-map in
> BGP peering config.
>
> I had some control plane configs wrong so that an RP and some
> PIM interfaces were rejecting valid registers. Found another
> incorrect multicast-related filter that was broken in the process.
>
> ip sap listen on some interfaces and an totally borked control
> plane policer config caused OSPF adjacencies to bounce, because
> SA floods were starving OSPF traffic in the control plane policer.
>
> generic udp multicast rate limit for an ingress on subnets cause
> some file distribution ghost-like apps to completely fail.
>
> When there is a layer 2 topology change, our layer 2 devices flush
> their group/port state cache and cause brief multicast outages and
> flooding during these periods.
>
> And finally, one last non-operational problem... Multicast Beacon
> code upgrades released on a Friday that require us to upgrade by
> Monday. :-)
>
> John
>

Notes from BOF, Alan Crosswell, 02/06/2006
- Re: Notes from BOF, John Kristoff, 02/07/2006
  - Re: Notes from BOF, Alan Crosswell, 02/07/2006
    - Re: Notes from BOF, John Kristoff, 02/07/2006
      - Re: Notes from BOF, Marshall Eubanks, 02/07/2006
  - Re: Notes from BOF, Jonathan S. Thyer JSTHYER, 02/07/2006
    - Re: Notes from BOF, John Kristoff, 02/07/2006
      - Re: Notes from BOF, Charles Spurgeon, 02/08/2006
- Re: Notes from BOF, Stig Venaas, 02/08/2006
  - Re: Notes from BOF, Greg Wickham, 02/14/2006
- <Possible follow-up(s)>
- RE: Notes from BOF, Richard Mavrogeanes, 02/07/2006
- RE: Notes from BOF, Roberts, Michael J. (IATS), 02/08/2006
- RE: Notes from BOF, Field, Brian, 02/08/2006

List archive

Re: Notes from BOF