Do not perform cleanup if Manifest write fails with dirty exception #40519

andrershov · 2019-03-27T13:05:20Z

Currently if Manifest write is unsuccesful (i.e. WriteStateException is thrown) we perform cleanup of newly created metadata files. However, this is wrong.
Consider the following sequence (catched by CI here #39077):

cluster global data is written successful
the associated manifest write fails (during the fsync, ie files have been written)
deleting (revert) the manifest files, fails, metadata is therefore persisted
deleting (revert) the cluster global data is successful

In this case, when trying to load metadata (after node restart because of dirty WriteStateException), the following exception will happen

java.io.IOException: failed to find global metadata [generation: 0]

because manifest file is referencing missing global metadata file.

This commit checks if thrown WriteStateException is dirty and if its we don't perform any cleanup, because new Manifest file might be created, but its deletion has failed.
In the future, we might add more fine grained check - perform the cleanup if WriteStateException is dirty, but Manifest deletion is succesful.

Closes #39077

elasticmachine · 2019-03-27T13:05:22Z

Pinging @elastic/es-distributed

ywelsch

Also unmute testAtomicityWithFailures?

albertzaharovits

LGTM

…40519) Currently, if Manifest write is unsuccessful (i.e. WriteStateException is thrown) we perform cleanup of newly created metadata files. However, this is wrong. Consider the following sequence (caught by CI here #39077): - cluster global data is written **successful** - the associated manifest write **fails** (during the fsync, ie files have been written) - deleting (revert) the manifest files, **fails**, metadata is therefore persisted - deleting (revert) the cluster global data is **successful** In this case, when trying to load metadata (after node restart because of dirty WriteStateException), the following exception will happen ``` java.io.IOException: failed to find global metadata [generation: 0] ``` because the manifest file is referencing missing global metadata file. This commit checks if thrown WriteStateException is dirty and if its we don't perform any cleanup, because new Manifest file might be created, but its deletion has failed. In the future, we might add more fine-grained check - perform the clean up if WriteStateException is dirty, but Manifest deletion is successful. Closes #39077 (cherry picked from commit 1fac569)

* elastic/7.0: [TEST] Mute WebhookHttpsIntegrationTests.testHttps [DOCS] Add 'time value' links to several monitor settings (elastic#40633) (elastic#40687) Do not perform cleanup if Manifest write fails with dirty exception (elastic#40519) Remove mention of soft deletes from getting started (elastic#40668) Fix bug in detecting use of bundled JDK on macOS Reindex conflicts clarification (docs) (elastic#40442) SQL: [Tests] Enable integration tests for fixed issues (elastic#40664) Add information about the default sort mode (elastic#40657) SQL: [Docs] Fix example for CURDATE SQL: [Docs] Fix doc errors regarding CURRENT_DATE. (elastic#40649) Clarify using time_zone and date math in range query (elastic#40655) Add notice for bundled jdk (elastic#40576) disable kerberos test until kerberos fixture is working again [DOCS] Use "source" instead of "inline" in ML docs (elastic#40635) Unmute and fix testSubParserArray (elastic#40626) Geo Point parse error fix (elastic#40447) Increase suite timeout to 30 minutes for docs tests (elastic#40521) Fix repository-hdfs when no docker and unnecesary fixture Avoid building hdfs-fixure use an image that works instead

…lastic#40519) Currently, if Manifest write is unsuccessful (i.e. WriteStateException is thrown) we perform cleanup of newly created metadata files. However, this is wrong. Consider the following sequence (caught by CI here elastic#39077): - cluster global data is written **successful** - the associated manifest write **fails** (during the fsync, ie files have been written) - deleting (revert) the manifest files, **fails**, metadata is therefore persisted - deleting (revert) the cluster global data is **successful** In this case, when trying to load metadata (after node restart because of dirty WriteStateException), the following exception will happen ``` java.io.IOException: failed to find global metadata [generation: 0] ``` because the manifest file is referencing missing global metadata file. This commit checks if thrown WriteStateException is dirty and if its we don't perform any cleanup, because new Manifest file might be created, but its deletion has failed. In the future, we might add more fine-grained check - perform the clean up if WriteStateException is dirty, but Manifest deletion is successful. Closes elastic#39077

Do not perform cleanup if Manifest write fails with dirty exception

ea856bd

andrershov added >bug v7.0.0 :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.0.0 v7.2.0 labels Mar 27, 2019

andrershov requested review from ywelsch and albertzaharovits March 27, 2019 13:05

ywelsch approved these changes Mar 27, 2019

View reviewed changes

albertzaharovits approved these changes Mar 28, 2019

View reviewed changes

Unmute test

a2e266f

andrershov merged commit 1fac569 into elastic:master Apr 1, 2019

jakelandis added v7.0.0-rc2 and removed v7.0.0 labels Apr 3, 2019

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not perform cleanup if Manifest write fails with dirty exception #40519

Do not perform cleanup if Manifest write fails with dirty exception #40519

andrershov commented Mar 27, 2019

elasticmachine commented Mar 27, 2019

ywelsch left a comment

albertzaharovits left a comment

Do not perform cleanup if Manifest write fails with dirty exception #40519

Do not perform cleanup if Manifest write fails with dirty exception #40519

Conversation

andrershov commented Mar 27, 2019

elasticmachine commented Mar 27, 2019

ywelsch left a comment

Choose a reason for hiding this comment

albertzaharovits left a comment

Choose a reason for hiding this comment