Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail start of non-data node if node has data #37347

Conversation

henningandersen
Copy link
Contributor

Check that nodes started with node.data=false cannot start if they have
shard data to avoid (old) indexes being resurrected into the cluster in red status.

Issue #27073

Only fixes node.data=false part of issue, not coordinating only.

I were not sure if this should go into docs too?

Check that nodes started with node.data=false cannot start if they have
shard data to avoid (old) indexes being resurrected into the cluster in red status.

Issue elastic#27073
@henningandersen henningandersen added >enhancement v7.0.0 :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) labels Jan 11, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@henningandersen henningandersen self-assigned this Jan 11, 2019
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this up @henningandersen. I've left some comments.

});
fail("Starting a node with node.data=false that has index/shard data must fail");
} catch (IllegalArgumentException e) {
// EXPECTED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also assert something on the exception message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up only asserting that the message contains the index UUID, would be nice to assert the entire path if possible, but I will need help if you think that is worth pursuing?

throw new IllegalArgumentException("Node is started with "
+ Node.NODE_DATA_SETTING.getKey()
+ "=false, but found shard data: "
+ badDataPaths);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the absolute path that we're outputting here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be the relative path if path.node is relative. Do you prefer the absolute path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to absolute path.

Improved message and exception type plus cleaner tests.
@henningandersen
Copy link
Contributor Author

Thanks for your comments @ywelsch, I have addressed mostt of them in the latest commit and added comments to the rest. Will you please take another look?

@henningandersen
Copy link
Contributor Author

@elasticmachine run gradle build tests 1

@ywelsch ywelsch self-requested a review January 14, 2019 11:38
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a few more nits, but looks good o.w.

The reason I explicitly build settings here is to be able to reuse the path.node setting for the next newNodeEnvironment call. Is there a better way?

I missed that one. Thanks for clarifying. Can you perhaps add a comment where you create the second NodeEnvironment to make this more explicit? I don't see a better way btw.


final String node_1 = internalCluster().startNodes(1).get(0);
logger.info("--> starting one node");
internalCluster().startNodes(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just startNode()

final String indexUUID = resolveIndex(indexName).getUUID();

logger.info("--> indexing a simple document");
client().prepareIndex(indexName, "type1", "1").setSource("field1", "value1").setRefreshPolicy(IMMEDIATE).get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to set .setRefreshPolicy(IMMEDIATE). The indexed changes are guaranteed to be made durable before the index request returns here. Visibility for searches does not matter as we don't search here.

return Settings.builder().put(Node.NODE_DATA_SETTING.getKey(), false).build();
}
}));
assertThat(ex.getMessage(), containsString(indexUUID));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert that message starts with "Node is started with " + Node.NODE_DATA_SETTING.getKey() + "=false, but has shard data"

nodeAndClient.startNode();
success = true;
} finally {
if (!success)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

success == false

() -> newNodeEnvironment(noDataSettings).close());

assertThat(ex.getMessage(),
containsString(env.indexPaths(index)[0].resolve(shardDir).toAbsolutePath().toString()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, assert that message starts with "Node is started with " + Node.NODE_DATA_SETTING.getKey() + "=false, but has shard data"

Files.createDirectories(path.resolve(MetaDataStateFormat.STATE_DIR_NAME));
}

env.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that you can also use try-with-resources here, e.g.:

try (NodeEnvironment env = newNodeEnvironment(settings)) {
    for (Path path : env.indexPaths(index)) {
        Files.createDirectories(path.resolve(MetaDataStateFormat.STATE_DIR_NAME));
    }
}

This helps in cleaning up resources even when tests fail, and avoids having secondary failures that might distract when analyzing a test failure.

Avoid impacting subsequent tests on failures and added assert for
more of the message.
@henningandersen
Copy link
Contributor Author

@elasticmachine run gradle build tests 2

@henningandersen
Copy link
Contributor Author

@elasticmachine please run the Gradle build tests 1

@henningandersen
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci-1

@henningandersen
Copy link
Contributor Author

@elasticmachine run gradle build tests 1

4 similar comments
@henningandersen
Copy link
Contributor Author

@elasticmachine run gradle build tests 1

@henningandersen
Copy link
Contributor Author

@elasticmachine run gradle build tests 1

@henningandersen
Copy link
Contributor Author

@elasticmachine run gradle build tests 1

@henningandersen
Copy link
Contributor Author

@elasticmachine run gradle build tests 1

@henningandersen
Copy link
Contributor Author

@elasticmachine run gradle build tests 2

@henningandersen henningandersen merged commit 2286118 into elastic:master Jan 22, 2019
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Jan 22, 2019
* elastic/master: (43 commits)
  Remove remaining occurances of "include_type_name=true" in docs (elastic#37646)
  SQL: Return Intervals in SQL format for CLI (elastic#37602)
  Publish to masters first (elastic#37673)
  Un-assign persistent tasks as nodes exit the cluster (elastic#37656)
  Fail start of non-data node if node has data (elastic#37347)
  Use cancel instead of timeout for aborting publications (elastic#37670)
  Follow stats api should return a 404 when requesting stats for a non existing index (elastic#37220)
  Remove deprecated FieldNamesFieldMapper.Builder#index (elastic#37305)
  Document that date math is locale independent
  Bootstrap a Zen2 cluster once quorum is discovered (elastic#37463)
  Upgrade to lucene-8.0.0-snapshot-83f9835. (elastic#37668)
  Mute failing test
  Fix java time formatters that round up (elastic#37604)
  Removes awaits fix as the fix is in. (elastic#37676)
  Mute failing test
  Ensure that max seq # is equal to the global checkpoint when creating ReadOnlyEngines (elastic#37426)
  Mute failing discovery disruption tests
  Add note about how the body is referenced (elastic#33935)
  Define constants for REST requests endpoints in tests (elastic#37610)
  Make prepare engine step of recovery source non-blocking (elastic#37573)
  ...
henningandersen added a commit that referenced this pull request Feb 2, 2019
Now warn about both left-behind data and metadata for non-data or
non-data and non-master nodes. Disable dangling indices check completely
for coordinating only nodes (non-data and non-master).

Issue #27073
6.x backport of #37347 and #37748 (without failing start up).
@colings86 colings86 removed the v7.0.0 label Feb 7, 2019
henningandersen added a commit to henningandersen/elasticsearch that referenced this pull request Feb 26, 2019
When a node is repurposed to master/no-data or no-master/no-data, v7.x
will not start (see elastic#37748 and elastic#37347). The `elasticsearch repurpose`
tool can fix this by cleaning up the problematic data.
henningandersen added a commit that referenced this pull request Mar 19, 2019
When a node is repurposed to master/no-data or no-master/no-data, v7.x
will not start (see #37748 and #37347). The `elasticsearch repurpose`
tool can fix this by cleaning up the problematic data.
henningandersen added a commit that referenced this pull request Mar 19, 2019
When a node is repurposed to master/no-data or no-master/no-data, v7.x
will not start (see #37748 and #37347). The `elasticsearch repurpose`
tool can fix this by cleaning up the problematic data.
henningandersen added a commit that referenced this pull request Mar 19, 2019
When a node is repurposed to master/no-data or no-master/no-data, v7.x
will not start (see #37748 and #37347). The `elasticsearch repurpose`
tool can fix this by cleaning up the problematic data.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >enhancement v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants