Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Zen2] Introduce gossip-like discovery of master nodes #32246

Merged
merged 77 commits into from
Aug 6, 2018

Conversation

DaveCTurner
Copy link
Contributor

This commit introduces the PeerFinder which can be used to collect the
identities of the master-eligible nodes in a masterless cluster, based on the
UnicastHostsProvider, the nodes in the ClusterState, and nodes that other
nodes have discovered.

This commit introduces the `PeerFinder` which can be used to collect the
identities of the master-eligible nodes in a masterless cluster, based on the
`UnicastHostsProvider`, the nodes in the `ClusterState`, and nodes that other
nodes have discovered.
@DaveCTurner DaveCTurner added >enhancement v7.0.0 :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Jul 20, 2018
@DaveCTurner DaveCTurner requested a review from ywelsch July 20, 2018 15:33
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@ywelsch ywelsch mentioned this pull request Jul 20, 2018
61 tasks
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done an initial round. I've left mostly smaller comments, the general flow and the testing looks good.


public abstract class PeerFinder extends AbstractLifecycleComponent {

public static final String REQUEST_PEERS_ACTION_NAME = "internal:discovery/zen2/requestpeers";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we remove the zen2 part here and just have this under discovery? Also maybe request_peers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done


// the time between attempts to find all peers
public static final Setting<TimeValue> CONSENSUS_FIND_PEERS_DELAY_SETTING =
Setting.timeSetting("discovery.zen2.find_peers_delay",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove zen2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done

public static final String REQUEST_PEERS_ACTION_NAME = "internal:discovery/zen2/requestpeers";

// the time between attempts to find all peers
public static final Setting<TimeValue> CONSENSUS_FIND_PEERS_DELAY_SETTING =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consensus? 🗡
just FIND_PEERS_DELAY_SETTING
I also wonder if we should call this FIND_PEERS_INTERVAL_SETTING

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, register the setting please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done

return getFoundPeersSet();
}

public boolean foundQuorumFrom(VotingConfiguration votingConfiguration) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is not used anywhere (also not in tests?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, fixed

this.executorServiceFactory = executorServiceFactory;
}

public Iterable<DiscoveryNode> getFoundPeers() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method is not used anywhere (also not in tests?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, fixed


private class ActivePeerFinder {
private boolean running;
private final Map<DiscoveryNode, FoundPeer> foundPeers;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make this package-visible? It's accessed by PeerFinder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, fixed. Is there a tool to tell you this? Also running.

}

logger.trace("startProbe({}) found disconnected {}, probing again", transportAddress, cachedNode);
connectedNodes.remove(transportAddress, cachedNode);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also clean-up foundPeers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not, but added a TODO for further discussion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foundPeers is no longer a thing - we just use connectedNodes throughout.

import java.util.Optional;

public class PeersResponse extends TransportResponse {
private final Optional<DiscoveryNode> masterNode;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be nicer to have this as a boolean (isActiveMaster) stating whether the current node providing the peer response is an active master node, and have the list of discoveryNodes just called masterNodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is now, if you contact a follower the next step is just to contact the follower's leader. With a flag you'd go ahead and contact all the follower's peers before discovering that one of them is the leader.

In a properly-configured cluster it doesn't seem to make much difference, but if badly configured (e.g. the leader is not in all the unicast hosts lists, and there are lots of master-eligible nodes) then there's less traffic this way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I follow (no pun intended). A follower would return isActiveMaster = false + the active master as singleton in the list of masterEligibleNodes, which would lead to exact same behavior as what's currently in the PR if I understand this correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed on Zoom. There's no strong argument either way. I slightly prefer that as it is now there's a difference in the responses between a follower and a candidate that knows of a single node, although we don't use this difference anywhere. I also see that moving to a single list (a singleton from followers) or a boolean would remove a couple of lines of code that deal with the master node as a special case. We'll leave it as is.


public class PeersRequest extends TransportRequest {
private final DiscoveryNode sourceNode;
private final List<DiscoveryNode> discoveryNodes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we call this masterNodes (or masterEligibleNodes)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to candidateNodes to reflect that they're only there if there isn't a known leader.

/**
* Label a <code>Runnable</code>, overriding its <code>toString()</code> method.
*/
public static Runnable labelRunnable(final Runnable runnable, final String label) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be dangerous if an abstract runnable is passed in
add an assertion that it's not an AbstractRunnable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, this is bad. I just removed it.

Copy link
Contributor Author

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved things to the discovery package.

* under the License.
*/

package org.elasticsearch.cluster.coordination;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, moved things around in 2013f10

@@ -180,7 +180,7 @@ private void handleWakeUp() {
}

if (active == false) {
logger.trace("ActivePeerFinder#handleWakeUp(): not running");
logger.trace("PeerFinder: not running");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we even need to say that this is PeerFinder? We have a logger for PeerFinder that already spits out the class name (same thing for other logging occurrences in this class).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course. I pushed 7948268.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

active = false;
handleWakeUp();
}
active = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert active == false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests deactivate the PeerFinder at the end, regardless of whether it's active or not, and there's no particular reason to avoid double-deactivation or to make sure that each test ends with an active PeerFinder.

Also although I am pretty sure that we can't change the leader without being active, this is not something we guarantee, nor do I think we need to.

@DaveCTurner DaveCTurner merged commit 2176184 into elastic:zen2 Aug 6, 2018
@DaveCTurner DaveCTurner deleted the 2018-07-20-peerfinder branch August 6, 2018 14:26
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Aug 6, 2018
The `PeerFinder`, introduced in elastic#32246, obtains the collection of seed
addresses configured by the user from a `ConfiguredHostsResolver`. In reality
this collection comes from the `UnicastHostsProvider` via a slightly
complicated threading model that performs the resolution of hostnames to
addresses using a dedicated `ExecutorService`. This commit introduces an
adapter to allow the `PeerFinder` to obtain its seed addresses in this manner.
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Aug 6, 2018
The `PeerFinder`, introduced in elastic#32246, needs to be able to identify, and
connect to, a remote master node using only its `TransportAddress`. This can be
done by opening a single-channel connection to the address, performing a
handshake, and only then forming a full-blown connection to the node. This
change implements this logic.
DaveCTurner added a commit that referenced this pull request Aug 7, 2018
The `PeerFinder`, introduced in #32246, needs to be able to identify, and
connect to, a remote master node using only its `TransportAddress`. This can be
done by opening a single-channel connection to the address, performing a
handshake, and only then forming a full-blown connection to the node. This
change implements this logic.
DaveCTurner added a commit that referenced this pull request Aug 7, 2018
The `PeerFinder`, introduced in #32246, obtains the collection of seed
addresses configured by the user from a `ConfiguredHostsResolver`. In reality
this collection comes from the `UnicastHostsProvider` via a slightly
complicated threading model that performs the resolution of hostnames to
addresses using a dedicated `ExecutorService`. This commit introduces an
adapter to allow the `PeerFinder` to obtain its seed addresses in this manner.
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Aug 21, 2018
Today, CapturingTransport#createCapturingTransportService creates a transport
service with a connection manager with reasonable default behaviours, but
overriding this behaviour in a consumer is a litle tricky. Additionally, the
default behaviour for opening a connection duplicates the content of the
CapturingTransport#openConnection() method.

This change removes this duplication by delegating to openConnection() and
introduces overridable nodeConnected() and onSendRequest() methods so that
consumers can alter this behaviour more easily.

Relates elastic#32246 in which we test the mechanisms for opening connections to
unknown (and possibly unreachable) nodes.
DaveCTurner added a commit that referenced this pull request Aug 22, 2018
Today, CapturingTransport#createCapturingTransportService creates a transport
service with a connection manager with reasonable default behaviours, but
overriding this behaviour in a consumer is a litle tricky. Additionally, the
default behaviour for opening a connection duplicates the content of the
CapturingTransport#openConnection() method.

This change removes this duplication by delegating to openConnection() and
introduces overridable nodeConnected() and onSendRequest() methods so that
consumers can alter this behaviour more easily.

Relates #32246 in which we test the mechanisms for opening connections to
unknown (and possibly unreachable) nodes.
DaveCTurner added a commit that referenced this pull request Aug 22, 2018
Today, CapturingTransport#createCapturingTransportService creates a transport
service with a connection manager with reasonable default behaviours, but
overriding this behaviour in a consumer is a litle tricky. Additionally, the
default behaviour for opening a connection duplicates the content of the
CapturingTransport#openConnection() method.

This change removes this duplication by delegating to openConnection() and
introduces overridable nodeConnected() and onSendRequest() methods so that
consumers can alter this behaviour more easily.

Relates #32246 in which we test the mechanisms for opening connections to
unknown (and possibly unreachable) nodes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >enhancement v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants