wg-multicast - Re: Notes from BOF

Subject: All things related to multicast

List archive

Re: Notes from BOF

From: "Jonathan S. Thyer JSTHYER" <>
To: John Kristoff <>
Cc:
Subject: Re: Notes from BOF
Date: Tue, 7 Feb 2006 17:35:11 -0500

John,

>High (90+%) cpu on 6509s with sup2's when a multicast app is >sending with a TTL=1.
I have observed this issue on our campus network. Do you guys know if Cisco has a bug id on this?

-------
Joff Thyer
Senior Networks Engineer/Architect
IT Networks Department
211C Forney Building, UNCG
PO BOX 26170, Greensboro NC 27402-6170
Phone: (336) 256-TECH

John Kristoff <>

02/07/2006 11:03 AM

To
cc
Subject	Re: Notes from BOF

On Mon, Feb 06, 2006 at 04:29:17PM -0500, Alan Crosswell wrote: > (John Kristoff be sending his notes separately.) As I mentioned to Alan a couple days ago I didn't get a chance to present anything formally and ended up scribbling a few notes at the start of the BoF. I'll include a bunch of details here and a couple others I didn't mention. I referred to a paper and a tool. The paper is "Failure to thrive: QoS and the culture of operational networking", which you can find from the ACM RIPQoS workshop. The reason I like referring to this paper is because of the very familiar feeling of pain described in that paper regarding stable multicast operations. I spoke with the author after the BoF and he indicated that multicast in their environment is much more stable now than it was described in that paper. However, for me, the situation persists. The reason we believe is due large part to frequency of code upgrades and the implementation of "new" knobs that relate to multicast protocols. My most recent environment was frequently going through code upgrades and the use of new knobs, particularly the "hardening" knobs to help mitigate unnecessary multicast state and flooding. As I understand it, these types of change at LBNL are far and few between in recent memory. The tool I referred to is a very crude Perl script that tries to summarize some rudimentary multicast state and counters on a router. It spits out per interface counts for IGMP joins, IGMP leaves, in and out multicast octets as well as if MSDP is enabled and how many SA cache entries there is so. The idea was to be able to just get a quick snapshot of some key numbers to help quickly spot obvious anomalous multicast load/state. mcastsum can be found here: <http://aharp.ittns.northwestern.edu/software/> The following is a list of issues we've experienced over the past year or so, some with varying degree of end user pain. Generally all took up a non-trivial amount of support effort and time. And except for cases involving our NUTV service, in my estimation, our local multicast user population is in the single digits. Note, let me be clear this is not an attempt to pick on a vendor. In all cases involving bugs, support people I worked with were all very good. Bugs happen. JUNOS bug PIM logic bug causing sources not directly attached to flap. It was unclear when this started happening, but it surfaced about a month or two after the last JUNOS upgrade and we believe we didn't have the problem for that long of a period. We never figured out why it started happening and it took awhile to find this one. Took troubleshooting from the Juniper as well as the router vendor where the source was attached (Cisco). This one took some time to figure out. Had to bring up additional MSDP peers in front of the Juniper to work around this problem. JUNOS bug 'show multicast usage' crashed router, done by JTAC while in the process of troubleshooting previous bug. JUNOS bug mtrace command crashed route, done by JTAC while in the process of troubleshooting previous bug. JUNOS bug Source specific SA limiter was rejecting SAs from sources not actually exceeding the configured limit. IOS bug filter-sa-request doesn't work. IOS bug Not specifically multicast, but related. If you use certain modules, in our case a wireless lan module, multicast packets to it get processed using the port mirroring feature. These modules use span sessions starting at #1 and counting up. We had #1 configured and when we removed the commands, the router completely locked up, as well, oddly enough, did some of its neighbors. IOS oddity and bug Send a TCP ACK to a multicast address the router is listening to and you'll get a RST back, with the source address filled in with the group address you sent to. High (90+%) cpu on 6509s with sup2's when a multicast app is sending with a TTL=1. MREN not accepting routes from Abilene, typo in a route-map in BGP peering config. I had some control plane configs wrong so that an RP and some PIM interfaces were rejecting valid registers. Found another incorrect multicast-related filter that was broken in the process. ip sap listen on some interfaces and an totally borked control plane policer config caused OSPF adjacencies to bounce, because SA floods were starving OSPF traffic in the control plane policer. generic udp multicast rate limit for an ingress on subnets cause some file distribution ghost-like apps to completely fail. When there is a layer 2 topology change, our layer 2 devices flush their group/port state cache and cause brief multicast outages and flooding during these periods. And finally, one last non-operational problem... Multicast Beacon code upgrades released on a Friday that require us to upgrade by Monday. :-) John

Notes from BOF, Alan Crosswell, 02/06/2006
- Re: Notes from BOF, John Kristoff, 02/07/2006
  - Re: Notes from BOF, Alan Crosswell, 02/07/2006
    - Re: Notes from BOF, John Kristoff, 02/07/2006
      - Re: Notes from BOF, Marshall Eubanks, 02/07/2006
  - Re: Notes from BOF, Jonathan S. Thyer JSTHYER, 02/07/2006
    - Re: Notes from BOF, John Kristoff, 02/07/2006
      - Re: Notes from BOF, Charles Spurgeon, 02/08/2006
- Re: Notes from BOF, Stig Venaas, 02/08/2006
  - Re: Notes from BOF, Greg Wickham, 02/14/2006
- <Possible follow-up(s)>
- RE: Notes from BOF, Richard Mavrogeanes, 02/07/2006
- RE: Notes from BOF, Roberts, Michael J. (IATS), 02/08/2006
- RE: Notes from BOF, Field, Brian, 02/08/2006

List archive

Re: Notes from BOF