-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nodes in process of shutting down should not respond to discovery pings #27328
Labels
:Distributed Coordination/Discovery-Plugins
Anything related to our integration plugins with EC2, GCP and Azure
>enhancement
Comments
jakommo
added
the
:Distributed Coordination/Discovery-Plugins
Anything related to our integration plugins with EC2, GCP and Azure
label
Nov 9, 2017
ywelsch
added a commit
that referenced
this issue
Nov 13, 2017
When the current master node is shutting down, it sends a leave request to the other nodes so that they can eagerly start a fresh master election. Unfortunately, it was still possible for the master node that was shutting down to respond to ping requests, possibly influencing the election decision as it still appeared as an active master in the ping responses. This commit ensures that UnicastZenPing does not respond to ping requests once it's been closed. ZenDiscovery.doStop() continues to ensure that the pinging component is first closed before it triggers a master election. Closes #27328
ywelsch
added a commit
that referenced
this issue
Nov 13, 2017
When the current master node is shutting down, it sends a leave request to the other nodes so that they can eagerly start a fresh master election. Unfortunately, it was still possible for the master node that was shutting down to respond to ping requests, possibly influencing the election decision as it still appeared as an active master in the ping responses. This commit ensures that UnicastZenPing does not respond to ping requests once it's been closed. ZenDiscovery.doStop() continues to ensure that the pinging component is first closed before it triggers a master election. Closes #27328
ywelsch
added a commit
that referenced
this issue
Nov 13, 2017
When the current master node is shutting down, it sends a leave request to the other nodes so that they can eagerly start a fresh master election. Unfortunately, it was still possible for the master node that was shutting down to respond to ping requests, possibly influencing the election decision as it still appeared as an active master in the ping responses. This commit ensures that UnicastZenPing does not respond to ping requests once it's been closed. ZenDiscovery.doStop() continues to ensure that the pinging component is first closed before it triggers a master election. Closes #27328
ywelsch
added a commit
that referenced
this issue
Nov 13, 2017
When the current master node is shutting down, it sends a leave request to the other nodes so that they can eagerly start a fresh master election. Unfortunately, it was still possible for the master node that was shutting down to respond to ping requests, possibly influencing the election decision as it still appeared as an active master in the ping responses. This commit ensures that UnicastZenPing does not respond to ping requests once it's been closed. ZenDiscovery.doStop() continues to ensure that the pinging component is first closed before it triggers a master election. Closes #27328
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
:Distributed Coordination/Discovery-Plugins
Anything related to our integration plugins with EC2, GCP and Azure
>enhancement
It looks like a node that is being shutdown will still reply to an discovery ping.
Master
node-108
is gracefully shutdown, but still listed in the current nodes list. I guess this is because its just some milliseconds after the shutdown started.Then 3 seconds later the ping responses coming back and
node-108
is still listed. I checked the log onnode-108
, but there is nothing later than2017-11-08T15:01:00,843
, which makes me wonder why it is still listed in the ping list.Only explanation I have is that ping was replied before
2017-11-08T15:01:00,843
already, which would match up with theping starting
::00,843
vs:00,782
Then another ping is sent and 3 seconds later the reply does not list 108 anymore and a new master is elected.
Had a quick chat with @ywelsch and it looks like a node in shutdown would still reply to a discovery ping request if the shutdown is not yet finished.
Since such a node will be down shortly, it should not reply to a discovery ping.
It could also speed up master election. I.e. in the above example an extra cycle of 3 seconds was added because the old master was still listed.
The text was updated successfully, but these errors were encountered: