Skip to content

Commit

Permalink
[PLAT-4772][Platform] Allow stop + remove to wait for data to move of…
Browse files Browse the repository at this point in the history
…f tserver

Summary:
Given a universe where data can be moved off a node, we now wait for tablets to move off of the tserver that was first stopped on platform and then removed . Previously, if a node is stopped on Platform, and then the node is removed, we do not wait for the tablets to move off and just continue for the removal of the node from the universe.

We try to move tablets off of a node when possible

We remove the `isTServer` condition in `UpdatePlacementInfo.java` when blacklisting nodes because if a node is stopped, it's `isTServer` value is set as `false` but we would still like to blacklist this node

Test Plan:
Some things to understand beforehand:
1. When a node’s tserver/master is not running, isTserver/isMaster is false
2. If a node is stopped, the node is still alive, just that the tserver/master process is not running, thus `isTserverAliveOnNode` will be false on a stopped node

Create a GCP universe with 6 nodes and rf3, with AZs comprising of us-west-1, us-west-2, and us-east1. In the below tests, we should have a clean universe and all the nodes should be live.

Perform the following tests:

a) Happy path, stopping and then immediately removing a node from the universe
  1. Stop a node in us-west-2
  2. Immediately after the node is stopped, remove the same node from the universe
  3. Go to master UI at <master-ip>:7000/tablet-servers, on a node that is currently not being removed
  4. Keep refreshing the page, we should see the values for the node's `User Tablet-Peers / Leaders` slowly decrease until it hits 0 / 0
  5. The node should successfully be removed

b) Edge Case #1, Only removing a node from the universe
  1. Remove a node in us-west-2 from the universe
  2. Go to master UI at <master-ip>:7000/tablet-servers, on a node that is currently not being removed
  3. Keep refreshing the page, we should see the values for the node's `User Tablet-Peers / Leaders` slowly decrease until it hits 0 / 0
  4. The node should successfully be removed

c) Edge Case #2: Stopping a node, wait until tablets are moved off, then remove node from universe
  1. Stop a node in  us-west-2
  2. Go to master UI at <master-ip>:7000/tablet-servers
  3. Wait for around 10 - 15 mins, tablets from the stopped node should be moved off automatically after this timeframe, i.e. under the `User Tablet-Peers / Leaders` column, that node should display 0 / 0.
  4. Remove the same node from the universe
  5. Since the tablets are already moved off the node, this node should not have much of a wait time for node removal
  6. The node should successfully be removed

d) Edge case #3: Remove 2 nodes from the same AZ
  1.  Remove a node in us-west-2 from the universe
  2. Remove another node from us-west-2
  3. For the second node, since there is nowhere for the tablets to go to, we will not wait for the tablets to move, so on the master UI, we should see  x / 0  under the `User Tablet-Peers / Leaders` columns, where 'x' is the number of tablet peers. The RemoveNodeFromUniverse task should finish. However the value of 'x' should slowly decrease until it hits 0.

e) Edge case #4: Remove as many nodes as possible on Platform
  1. We should only be able to remove at most 1 node with a master server on it to maintain a majority of tablet peers (in our case, we have rf3, so 3 masters servers, thus we can only remove one master server).
  2. We should be able to remove all nodes with only tservers

All areas that use `UpdatePlacementInfo.java` either have the number of nodes to be blacklisted as 0 except for in `EditKubernetesUniverse.java` but we are already using tservers, so it is safe to remove the `isTserver` check in `UpdatePlacementInfo.java`

Reviewers: sanketh, nsingh

Reviewed By: nsingh

Subscribers: yugaware

Differential Revision: https://phabricator.dev.yugabyte.com/D18596
  • Loading branch information
charleswang234 committed Aug 4, 2022
1 parent a28e191 commit 7fd0462
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 20 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
import com.yugabyte.yw.models.helpers.NodeDetails.NodeState;
import com.yugabyte.yw.models.helpers.PlacementInfo;
import java.util.Arrays;
import java.util.Collection;
import java.util.HashSet;
import javax.inject.Inject;
import lombok.extern.slf4j.Slf4j;
Expand Down Expand Up @@ -121,44 +122,43 @@ public void run() {
// if node is not reachable so as to avoid cases like 1 node in an 3 node cluster is being
// removed and we know LoadBalancer will not be able to handle that.
if (instanceAlive) {

Collection<NodeDetails> nodesExcludingCurrentNode = new HashSet<>(universe.getNodes());
nodesExcludingCurrentNode.remove(currentNode);
int rfInZone =
PlacementInfoUtil.getZoneRF(
pi,
currentNode.cloudInfo.cloud,
currentNode.cloudInfo.region,
currentNode.cloudInfo.az);
long nodesInZone =
long nodesActiveInAZExcludingCurrentNode =
PlacementInfoUtil.getNumActiveTserversInZone(
universe.getNodes(),
nodesExcludingCurrentNode,
currentNode.cloudInfo.cloud,
currentNode.cloudInfo.region,
currentNode.cloudInfo.az);

if (rfInZone == -1) {
log.error(
"Unexpected placement info in univ {} {} {}", universe.name, rfInZone, nodesInZone);
"Unexpected placement info in univ {} {} {}",
universe.name,
rfInZone,
nodesActiveInAZExcludingCurrentNode);
throw new RuntimeException(
"Error getting placement info for cluster with node: " + currentNode.nodeName);
}

// Perform a data migration and stop the tserver process only if it is reachable.
boolean tserverReachable = isTserverAliveOnNode(currentNode, masterAddrs);
log.info("Tserver {}, reachable = {}.", currentNode.cloudInfo.private_ip, tserverReachable);
if (tserverReachable) {
// Since numNodes can never be less, that will mean there is a potential node to move
// data to.
if (userIntent.numNodes > userIntent.replicationFactor) {
// We only want to move data if the number of nodes in the zone are more than the RF
// of the zone.
if (nodesInZone > rfInZone) {
createWaitForDataMoveTask()
.setSubTaskGroupType(SubTaskGroupType.WaitForDataMigration);
}
// Since numNodes can never be less, that will mean there is a potential node to move
// data to.
if (userIntent.numNodes > userIntent.replicationFactor) {
// We only want to move data if the number of nodes in the zone are more than or equal
// the RF of the zone.
// We would like to remove currentNode whether it is in live/stopped state
if (nodesActiveInAZExcludingCurrentNode >= rfInZone) {
createWaitForDataMoveTask().setSubTaskGroupType(SubTaskGroupType.WaitForDataMigration);
}
createTServerTaskForNode(currentNode, "stop")
.setSubTaskGroupType(SubTaskGroupType.StoppingNodeProcesses);
}
createTServerTaskForNode(currentNode, "stop")
.setSubTaskGroupType(SubTaskGroupType.StoppingNodeProcesses);
}

// Remove master status (even when it does not exists or is not reachable).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -184,7 +184,7 @@ public CatalogEntityInfo.SysClusterConfigEntryPB modifyConfig(
configBuilder.getServerBlacklistBuilder();
for (String nodeName : blacklistNodes) {
NodeDetails node = universe.getNode(nodeName);
if (node.isTserver && node.cloudInfo.private_ip != null) {
if (node.cloudInfo.private_ip != null) {
blacklistBuilder.addHosts(
ProtobufHelper.hostAndPortToPB(
HostAndPort.fromParts(node.cloudInfo.private_ip, node.tserverRpcPort)));
Expand Down

0 comments on commit 7fd0462

Please sign in to comment.