-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not perform cleanup if Manifest write fails with dirty exception #40519
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Pinging @elastic/es-distributed |
ywelsch
approved these changes
Mar 27, 2019
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also unmute testAtomicityWithFailures?
albertzaharovits
approved these changes
Mar 28, 2019
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
andrershov
pushed a commit
that referenced
this pull request
Apr 1, 2019
…40519) Currently, if Manifest write is unsuccessful (i.e. WriteStateException is thrown) we perform cleanup of newly created metadata files. However, this is wrong. Consider the following sequence (caught by CI here #39077): - cluster global data is written **successful** - the associated manifest write **fails** (during the fsync, ie files have been written) - deleting (revert) the manifest files, **fails**, metadata is therefore persisted - deleting (revert) the cluster global data is **successful** In this case, when trying to load metadata (after node restart because of dirty WriteStateException), the following exception will happen ``` java.io.IOException: failed to find global metadata [generation: 0] ``` because the manifest file is referencing missing global metadata file. This commit checks if thrown WriteStateException is dirty and if its we don't perform any cleanup, because new Manifest file might be created, but its deletion has failed. In the future, we might add more fine-grained check - perform the clean up if WriteStateException is dirty, but Manifest deletion is successful. Closes #39077 (cherry picked from commit 1fac569)
andrershov
pushed a commit
that referenced
this pull request
Apr 1, 2019
…40519) Currently, if Manifest write is unsuccessful (i.e. WriteStateException is thrown) we perform cleanup of newly created metadata files. However, this is wrong. Consider the following sequence (caught by CI here #39077): - cluster global data is written **successful** - the associated manifest write **fails** (during the fsync, ie files have been written) - deleting (revert) the manifest files, **fails**, metadata is therefore persisted - deleting (revert) the cluster global data is **successful** In this case, when trying to load metadata (after node restart because of dirty WriteStateException), the following exception will happen ``` java.io.IOException: failed to find global metadata [generation: 0] ``` because the manifest file is referencing missing global metadata file. This commit checks if thrown WriteStateException is dirty and if its we don't perform any cleanup, because new Manifest file might be created, but its deletion has failed. In the future, we might add more fine-grained check - perform the clean up if WriteStateException is dirty, but Manifest deletion is successful. Closes #39077 (cherry picked from commit 1fac569)
jasontedor
added a commit
to jasontedor/elasticsearch
that referenced
this pull request
Apr 1, 2019
* elastic/7.0: [TEST] Mute WebhookHttpsIntegrationTests.testHttps [DOCS] Add 'time value' links to several monitor settings (elastic#40633) (elastic#40687) Do not perform cleanup if Manifest write fails with dirty exception (elastic#40519) Remove mention of soft deletes from getting started (elastic#40668) Fix bug in detecting use of bundled JDK on macOS Reindex conflicts clarification (docs) (elastic#40442) SQL: [Tests] Enable integration tests for fixed issues (elastic#40664) Add information about the default sort mode (elastic#40657) SQL: [Docs] Fix example for CURDATE SQL: [Docs] Fix doc errors regarding CURRENT_DATE. (elastic#40649) Clarify using time_zone and date math in range query (elastic#40655) Add notice for bundled jdk (elastic#40576) disable kerberos test until kerberos fixture is working again [DOCS] Use "source" instead of "inline" in ML docs (elastic#40635) Unmute and fix testSubParserArray (elastic#40626) Geo Point parse error fix (elastic#40447) Increase suite timeout to 30 minutes for docs tests (elastic#40521) Fix repository-hdfs when no docker and unnecesary fixture Avoid building hdfs-fixure use an image that works instead
gurkankaymak
pushed a commit
to gurkankaymak/elasticsearch
that referenced
this pull request
May 27, 2019
…lastic#40519) Currently, if Manifest write is unsuccessful (i.e. WriteStateException is thrown) we perform cleanup of newly created metadata files. However, this is wrong. Consider the following sequence (caught by CI here elastic#39077): - cluster global data is written **successful** - the associated manifest write **fails** (during the fsync, ie files have been written) - deleting (revert) the manifest files, **fails**, metadata is therefore persisted - deleting (revert) the cluster global data is **successful** In this case, when trying to load metadata (after node restart because of dirty WriteStateException), the following exception will happen ``` java.io.IOException: failed to find global metadata [generation: 0] ``` because the manifest file is referencing missing global metadata file. This commit checks if thrown WriteStateException is dirty and if its we don't perform any cleanup, because new Manifest file might be created, but its deletion has failed. In the future, we might add more fine-grained check - perform the clean up if WriteStateException is dirty, but Manifest deletion is successful. Closes elastic#39077
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
>bug
:Distributed Coordination/Cluster Coordination
Cluster formation and cluster state publication, including cluster membership and fault detection.
v7.0.0-rc2
v7.2.0
v8.0.0-alpha1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently if Manifest write is unsuccesful (i.e. WriteStateException is thrown) we perform cleanup of newly created metadata files. However, this is wrong.
Consider the following sequence (catched by CI here #39077):
In this case, when trying to load metadata (after node restart because of dirty WriteStateException), the following exception will happen
because manifest file is referencing missing global metadata file.
This commit checks if thrown WriteStateException is dirty and if its we don't perform any cleanup, because new Manifest file might be created, but its deletion has failed.
In the future, we might add more fine grained check - perform the cleanup if WriteStateException is dirty, but Manifest deletion is succesful.
Closes #39077