-
Notifications
You must be signed in to change notification settings - Fork 344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: crmd: Prevent use-after-free when an unexpected remote client takes over #928
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -492,8 +492,11 @@ remote_lrm_op_callback(lrmd_event_data_t * op) | |
if (ra_data->migrate_status == expect_takeover) { | ||
ra_data->migrate_status = takeover_complete; | ||
} else { | ||
crm_err("Unexpected pacemaker_remote client takeover. Disconnecting"); | ||
lrm_state_disconnect(lrm_state); | ||
crm_err("Unexpected pacemaker_remote client takeover for %s. Disconnecting", op->remote_nodename); | ||
/* In this case, lrmd_tls_connection_destroy() will be called under the control of mainloop. */ | ||
/* Do not free lrm_state->conn yet. */ | ||
/* It'll be freed in the following stop action. */ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How do we know there will be a following stop action? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The disconnecting will introduce a monitor failure, which will cause it to be restarted, and actually eventually trigger a fencing. |
||
lrm_state_disconnect_only(lrm_state); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we're not freeing lrm_state here and expecting a stop action on it later, I'm thinking we should set migrate_status to takeover_complete unconditionally. That would prevent the stop from announcing the remote node as down. (If I'm following what's happening here) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure if it makes sense for an unexpected takeover though. It can be produced:
See also:
|
||
} | ||
return; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is the source removed from the mainloop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The source is removed by lrm_state_disconnect() -> lrmd_api_disconnect() -> lrmd_tls_disconnect() -> lrmd_tls_disconnect() -> mainloop_del_ipc_client() -> mainloop_del_fd().
But after the control was returned to mainloop, crmd dumped core:
#0 0x00007fc8ac999187 in raise () from /lib64/libc.so.6
#1 0x00007fc8ac99a538 in abort () from /lib64/libc.so.6
#2 0x00007fc8ac9d6804 in __libc_message () from /lib64/libc.so.6
#3 0x00007fc8ac9dc06e in malloc_printerr () from /lib64/libc.so.6
#4 0x00007fc8ac9dcd86 in _int_free () from /lib64/libc.so.6
#5 0x00007fc8acd0fee5 in lrmd_tls_connection_destroy (userdata=) at lrmd_client.c:547
#6 0x00007fc8ad16903d in mainloop_gio_destroy (c=0x271f220) at mainloop.c:744
#7 0x00007fc8ac6a9228 in ?? () from /usr/lib64/libglib-2.0.so.0
#8 0x00007fc8ac6ac272 in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
#9 0x00007fc8ac6ac4b8 in ?? () from /usr/lib64/libglib-2.0.so.0
#10 0x00007fc8ac6ac8ba in g_main_loop_run () from /usr/lib64/libglib-2.0.so.0
#11 0x000000000040715e in crmd_init () at main.c:154
#12 0x0000000000406f34 in main (argc=1, argv=0x7ffd1c433858) at main.c:121