-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dangling indices living in non-data nodes are detected and auto-imported #27073
Comments
We discussed this on Fixit Friday and agreed to add a check that will fail:
This means that some user action (explicitly deleting shard data) is going to be required if a data node is switched to a master-only/ coordinating node. |
Is this taken or can I pick it? |
@swethapavan sure, go ahead. |
Thank you |
I think we can fail earlier than the bootstrap checks so I'm not sure if this should be a bootstrap check, isn't it enough to be a check in node environment (we've done this in the past with the default path data issue)? |
yes, I used bootstrap check in the larger sense here when I meant "a boot/start time check". It does not require the bootstrap checks code infrastructure. |
I have done the changes but I get errors when i run some tests because the node fails due to the existence of dangling indices |
Specifically, these are the tests that fail:
|
I wonder if adding it as a bootstrap check is actually a feature (ie. testing for it later). Like I can totally see starting up a node with |
@swethapavan please open a PullRequest or share your code otherwise we won't be able to help you |
… and auto-imported. Some test cases are failing. Need to check further.
@s1monw I have created a pull request. Kindly have a look. |
My preference would be not to have this as a bootstrap check. Bootstrap checks are requirements for going to production, and we should keep them at a strict minimum so that the difference between prod and dev stays low. For this particular check, I don't see a good reason why we would not want to enforce it for development mode as well. If you want to start-up a node with |
Is this issue still open, there seems to be no update on it since long. I would like to work on this. |
Is this fixed on 6.x? Ran into this issue yesterday on 5.6.10 |
The proposal is to detect if a data=false node have any data and fail startup if that is the case. However, even indices without any data can be resurrected and I wonder if we need to also handle that? I have created a slightly modified reproduction case to explain this:
should give something like following (notice: different data folders):
Expected log for
and for
Looking at the file system, both indices now exist on node-1 too without any data:
and both are red status:
This makes me wonder whether the proposed change is enough since there is still a risk of resurrecting old indexes that did not have any shards allocated on the node? |
Had a conversation with @ywelsch on this on another channel. We came to the conclusion that the original proposal should be implemented to avoid resurrecting the indices in clearly bad cases and also to avoid having old data lying around that are invalid for the type of node. |
Check that nodes started with node.data=false cannot start if they have shard data to avoid (old) indexes being resurrected into the cluster in red status. Issue elastic#27073
* Fail start of non-data node if node has data Check that nodes started with node.data=false cannot start if they have shard data to avoid (old) indexes being resurrected into the cluster in red status. Issue #27073
Node started with node.data=false and node.master=false can no longer start if they have index metadata. This avoids resurrecting old indexes into the cluster and ensures metadata is cleaned out before re-purposing a node that was previously master or data node. Closes elastic#27073
Added breaking changes documentation for node start up obsolete indices detection. Issue elastic#27073
Minor formatting corrections Issue elastic#27073
For a non-data, non-master node we now warn about dangling indices and will otherwise ignore them. This avoids import of old indices with a following inevitable red cluster status. Issue elastic#27073
Node started with node.data=false and node.master=false can no longer start if they have index metadata. This avoids resurrecting old indexes into the cluster and ensures metadata is cleaned out before re-purposing a node that was previously master or data node. Issue #27073
Improved documentation on when nodes will refuse to start up. Issue elastic#27073
Added breaking changes documentation for node start up obsolete indices detection. Issue #27073
Elasticsearch version (
bin/elasticsearch --version
):Plugins installed: []
JVM version (
java -version
):OS version (
uname -a
if on a Unix-like system):Darwin Thiagos-MacBook-Pro.local 17.0.0 Darwin Kernel Version 17.0.0: Thu Aug 24 21:48:19 PDT 2017; root:xnu-4570.1.46~2/RELEASE_X86_64 x86_64
Description of the problem including expected versus actual behavior:
If a non-data node, that contains dangling indices in it's data path, joins a cluster these dangling indices will be detected and auto-imported.
IMO, a non-data node that contains index data in it's data path is probably accidental and unintended. In this case, those dangling indices should not be detected, better yet if the node does not even starts (maybe a bootstrap check that fails if a non-data node contains index data in it's data path).
Steps to reproduce:
This can be done in a single machine:
node-1
withbin/elasticsearch -E path.data=/Users/thiago/data-1 -E node.name=node-1
node-2
withbin/elasticsearch -E path.data=/Users/thiago/data-2 -E node.name=node-2
test
configured with1S/0R
withcurl -XPUT localhost:9200/test -d '{ "settings": { "index": { "number_of_shards": 1, "number_of_replicas": 0 } } }' -H "Content-Type: application/json"
curl -XPOST localhost:9200/test -d '{ "test": 1 }' -H "Content-Type: application/json"
data-1
ordata-2
, that the shard for indextest
was created in and delete the other empty data directory (so we effectively make a dangling index).data-2
was deleted. So startnode-2
again withbin/elasticsearch -E path.data=/Users/thiago/data-2 -E node.name=node-2
node-1
(which contains dangling indices) as a non-data node withbin/elasticsearch -E path.data=/Users/thiago/data-1 -E node.name=node-1 -E node.data=false
Provide logs (if relevant):
After non-data node
node-1
starts,node-2
will detect and auto-import dangling indices even thoughnode-1
is a non-data node:The text was updated successfully, but these errors were encountered: