Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #22 does not seem to be resolved #37

Closed
mfspeer opened this issue Mar 18, 2015 · 4 comments
Closed

Issue #22 does not seem to be resolved #37

mfspeer opened this issue Mar 18, 2015 · 4 comments
Milestone

Comments

@mfspeer
Copy link

mfspeer commented Mar 18, 2015

I downloaded, compiled 2.2.0 and set it up in a triangle topology of 3 routers with the top node of the triangle configured as the RP. I start traffic and wait for traffic to switch to shortest path tree (first hop router and last hop router left and right hand nodes of the triangle respectively). I then simulate a link
down event between the two nodes and I get same crash and stack trace previously reported for this issue:

#0 0x0805beec in add_jp_entry () 
#1 0x0805739e in age_routes () 
#2 0x0804e670 in timer () 
#3 0x080545d1 in age_callout_queue () 
#4 0x0804ee0f in main () 
@mfspeer
Copy link
Author

mfspeer commented Mar 18, 2015

Here's the stack trace from my crash:

[New process 93495    ]
#0  0x0806d498 in add_jp_entry (pim_nbr=0x808c248, holdtime=210, 
    group=33620448, grp_msklen=32 ' ', source=1685262346, src_msklen=32 ' ', 
    addr_flags=0, join_prune=2 '\002') at pim_proto.c:2138

warning: Source file is more recent than executable.
2138                break;
(gdb) print *pim_nbr
$1 = {next = 0x0, prev = 0x106, address = 134587968, vifi = 0, timer = 0, 
  build_jp_message = 0x1}
Current language:  auto; currently minimal
(gdb) $c
Undefined command: "$c".  Try "help".
(gdb) where
#0  0x0806d498 in add_jp_entry (pim_nbr=0x808c248, holdtime=210, 
    group=33620448, grp_msklen=32 ' ', source=1685262346, src_msklen=32 ' ', 
    addr_flags=0, join_prune=2 '\002') at pim_proto.c:2138
#1  0x080643ba in age_routes () at timer.c:713
#2  0x0805a64d in timer (i=0x0) at main.c:675
#3  0x0806014f in age_callout_queue (elapsed_time=0) at callout.c:94
#4  0x0805a5e7 in main (argc=0, argv=0x8047e50) at main.c:638
(gdb)  

This looks more like memory corruption, but I could be wrong. Prev field of pim_nbr seems to be corrupted.

@idismmxiv
Copy link
Contributor

Interesting. Did you have switches between PIMd routers or were they directly connected? How did you caused the link down? Was it sending side pimd that crashed or receiving side? Did the multicast flow recover to work through upper part of triangel (through RP) before crash or how quickly crash occurred.

@mfspeer
Copy link
Author

mfspeer commented Mar 21, 2015

It's the receiving side operating with shortest path on.

@troglobit troglobit added this to the 2.2.1 milestone Apr 13, 2015
troglobit added a commit that referenced this issue Apr 20, 2015
Followup to 3aa829c, according to proposal in (the reopened) issue #22.

Signed-off-by: Joachim Nilsson <[email protected]>
@troglobit
Copy link
Owner

Really hope 69a5e34 fixes this bug once and for all!
(If not, please reopen issue #22.)

Thanks for all the help debugging it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants