wg-multicast - Juniper multicast 'pd-*' upstream tunnel interface issue

Subject: All things related to multicast

List archive

Juniper multicast 'pd-*' upstream tunnel interface issue

From: Michael Hare <>
To:
Subject: Juniper multicast 'pd-*' upstream tunnel interface issue
Date: Mon, 17 Jul 2006 13:32:44 -0500

All-

Per the discussion in the 7/17 multicast BoF. I can send more case notes if interested, I just picked out the interesting exchanges.

While I never confirmed 100% that PR 61118 was causing my problems, but our issues did seem to subside when we upgraded to a 'fixed' version (I believe it was 7.3R2)

-Michael

============-==================-===============

PR: 61118
SYNOPSIS: (S,G) entry stuck in spt-pending state and iif mismatch does not trigger switch to SPT
COMMITTED-IN RELEASE 7.2R3; 7.3R2; 7.4R1; 7.5R1

--=--

Date Submitted: FEB 09 2006 21:45
Originator:

Hi Michael --

It's interesting that you mentioned that case and it's associated PR
(61118: (S,G) entry stuck in spt-pending state and iif mismatch does not
trigger switch to SPT). Last July, one of our multicast developers
mentioned that he had just fixed PR/61118, which had similar symptoms as
your problem, but he thought the problem was only seen after a commit. I
talked to the JTAC engineer who handled that case and he concurs that the
two cases have very similar symptoms.

The good news is that the PR is fixed in later releases of Junos than you
said you have upgraded to. Here is the complete list ...

7.2R3
7.3R2
7.4R1
7.5R1

BTW, I have been looking at those tcpdump traces that you sent (I had to
upgrade my version of ethereal to read them). As you mentioned before, in
the non-working file, the router was receiver register messages without
ever sending a register-stop message. But, as you were saying, the packet
trace doesn't indicate if the router lost the iif-mismatch event, which is
what we are speculating is happening.

The recommendation at this point is to upgrade to one of the releases
mentioned above and see if the problem clears up.

Let me know if you intend to upgrade and if so, what the results are.

Thanks!
-rich

--=--

*** NOTES 02/08/2006 08:58:53 Michael Hare Action Type: update case
Richard-

We may have a breakthrough. Can you look at 2005-0610-0229 ? A coworker of mine was at
a multicast bof at i2 joint techs and found someone else that had a similiar issue.

These cases look very similar to me.

-Michael

--=--

At 12:16 PM 2/2/2006, Michael Hare wrote:
>Rich-
>
>I'll attach the packet captures to the case. Honestly, it's been long
>enough since I did the capture (12/21/05) that the (S,G) details are
>fuzzy, so we may have to capture again.
>
>The problem is basically the same as the first report. When things go
>haywire, 'show multicast route' output for the group in question, the
>output for 'upstream interface' changes to something unreasonable.
>
>m7h@r-uwmadison-isp#
run show multicast route group 233.0.32.50 detail
>Address family INET
>
>Group: 233.0.32.50
> Source: 138.49.134.244/32
> Upstream interface: pd-1/0/0.32769
> Downstream interface list:
> at-0/0/1.2
> Session description: Static Allocations
> Statistics: 2 kBps, 2 pps, 0 packets
> Next-hop ID: 584
> Upstream protocol: PIM
>
>--
>
>I took a look at the packet captures that we got and didn't really see
>anything too unusual. The packets coming in during a working and a
>failed state seem very similar. My hypothesis for what is broken is
>that, for some reason, our Juniper;
>
>In If the Juniper was working correctly and first hop router was broken,
>I'd expect the Juniper to be frantically sending back Register-Stop
>messages since it has the SPT already built, but it isn't, and I'm
>assuming it's because of the broken Upstream interface list.
>
>This issue has survived a code upgrade. It's happened with 7.0R2 and
>7.3R1. It's been a little over a month since the last occurrence. The
>set of IPs that cause the problem change but it's always been the same
>end user application.
>
>-Michael

---=---

*** NOTES 12/21/2005 03:07:30 Michael Hare Action Type: update case
Richard-

I worked around this problem again today be restarting RPD. I was able to get packet
captures before and after the RPD restart that shows working and non working PIM
activitity. I don't see anything obviously wrong in the non-working category, just a
deluge of PIM-REGISTER messages (which is something we noted in the PIM logs along time
ago).

Would it be fruitful to upload the packet capture and some text describing the various
events?

---=---

The original case notes:

*** NOTES 04/01/2005 02:06:56 Michael Hare Action Type: Customer Web Note
Juniper-

We are having a strange multicast problem, and I'm not sure if this is a client problem
or a router problem.

Host A is @ La Crosse: 138.49.134.244
Host B is @ La Crosse: 138.49.134.243 (same first hop as Host A)
Host C is @ Stanford: 171.65.29.6
Host D is @ Milwaukee: 205.213.163.106

Group in question: 233.86.152.48

Topology

show multicast route group 233.86.152.48 extensive
Address family INET

Group: 233.86.152.48
Source: 138.49.134.244/32
Upstream interface: at-1/1/0.8
Downstream interface list:
at-0/0/1.2
Session description: Static Allocations
Statistics: 2 kBps, 1 pps, 178 packets
Next-hop ID: 600
Upstream protocol: PIM
Route state: Active
Forwarding state: Forwarding
Cache lifetime/timeout: 360 seconds
Wrong incoming interface notifications:

------------------------------------------------------

The problems start when machine B joins the group as a receiver. Suddenly, traffic
sourced from A no longer reaches hosts C or D. The traffic continues to reach

r-uwlacrosse-hub and also r-uwmadison-isp. It no longer reaches r-uwmilwaukee-isp.

The multicast route state on r-uwmadison-isp changes; more specifically, the upstream
interface. The packet count also gets reset to 0

m7h@r-uwmadison-isp>
show multicast route group 233.86.152.48 extensive
Address family INET

Group: 233.86.152.48
Source: 138.49.134.244/32
Upstream interface: pd-1/0/0.32769 <-----
Downstream interface list:
at-0/0/1.2
Session description: Static Allocations
Statistics: 4 kBps, 2 pps, 1 packets
Next-hop ID: 600
Upstream protocol: PIM
Route state: Active
Forwarding state: Forwarding
Cache lifetime/timeout: 360 seconds
Wrong incoming interface notifications: 0

As I watch, the packet counter continues to increase on r-uwmadison-isp

m7h@r-uwmadison-isp> show multicast route group 233.86.152.48 extensive all
Address family INET

Group: 233.86.152.48
Source: 138.49.134.244/32
Upstream interface: pd-1/0/0.32769
Downstream interface list:
at-0/0/1.2
Session description: Static Allocations
Statistics: 1 kBps, 1 pps, 81 packets
Next-hop ID: 600
Upstream protocol: PIM
Route state: Active
Forwarding state: Forwarding
Cache lifetime/timeout: 360 seconds
Wrong incoming interface notifications: 0

The PIM output looks reasonable; it still seems to have the correct upstream interface.
m7h@r-uwmadison-isp>
show pim join 233.86.152.48 extensive
Instance: PIM.master Family: INET

Group: 233.86.152.48
Source: *
RP: 140.189.1.254
Flags: sparse,rptree,wildcard
Upstream interface: local
Upstream State: Local RP
Downstream Neighbors:
Interface: at-1/1/0.8
140.189.8.230 State: Join Flags: SRW Timeout: 207

Group: 233.86.152.48
Source: 138.49.134.244
Flags: sparse,spt-pending
Upstream interface: at-1/1/0.8
Upstream State: Local RP, Join to Source
Keepalive timeout: 191
Downstream Neighbors:
Interface: at-0/0/1.2
140.189.8.2 State: Join Flags: S Timeout: 162
Interface: at-1/1/0.8 (pruned)
140.189.8.230 State: Prune Flags: SR Timeout: 207

So, finally, my questions!

1) pd-1/0/0 is our link services pic. This, to me, implies that the packets coming
from machine A were encapsulated. But why would they suddenly be encapsulated when
machine B joined as a receiver? (I haven't tried to troubleshoot the first hop router
yet; it's not a device I have access to)

2) Even if, for some reason, the packets are coming in encapsulated, since
r-uwmilwaukee-isp is still in the downstream interface list, shouldn't the packets
continue to flow to r-uwmilwaukee-isp? According to the 'show multicast route group
233.86.152.48 extensive all' command on r-uwmilwaukee-isp, the packets are not arriving
because the packet counter does not increase at that router.

3) If machine A and machine B are on the same router interface, I'm thinking that when
router B sends an IGMP join, none of the other routers should know about this. Is this
assumption correct if everything is in a working state?

I can send you the topology in a detailed map if you wish.

--
=======================W===
Michael Hare
UW-Madison + WiscNet Network Engineering
Desk: 608-262-5236
24 Hr Noc: 608-263-4188

Juniper multicast 'pd-*' upstream tunnel interface issue, Michael Hare, 07/17/2006
- Re: Juniper multicast 'pd-*' upstream tunnel interface issue, Pekka Savola, 07/18/2006

List archive

Juniper multicast 'pd-*' upstream tunnel interface issue