-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Zen2] Change MetaDataStateFormat write semantics #34709
Conversation
Write should clearly report if storage is left in dirty state.
I'm not convinced that we need another exception type and do all this wrapping. If the goal is to treat exceptions in step 4 and 5 specially, maybe pass in a call-back to the write method that is to be called when step 4 or 5 fail. In most cases (i.e, for all files except the manifest file), we want to those failures to be treated just as any regular IOException, so the default will probably be to pass in an empty closure. In case of the manifest file, we can then kill the node in the closure. |
Pinging @elastic/es-distributed |
@ywelsch If you don't like converting WriteStateException back to IOException in classes that do not care, I can propose to make WriteStateException extend IOException. In this case, all this wrapping will disappear. For methods that really care, they can catch WriteStateException directly and check dirty flag. I'm not a fan of inflating write method signature to accept a closure, that will have the side-effect of shutting down the node. |
8ce0c82
to
b0ac9aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only substantive comment is about the extra calls to performDirectoryCleanup
that I think aren't necessary. I left a handful of nits too.
I'm in two minds about the WriteStateException <: IOException
thing vs the callback. On the one hand I find this quite easy to read; on the other hand we have to remember to catch WriteStateException
specifically where we care about needing to shut the node down. On balance I think this approach is good (we don't care about WriteStateException
in very many places).
/** | ||
* This exception is thrown when there is a problem of writing state to disk. <br> | ||
* If {@link #isDirty()} returns false, state is guaranteed to be not written to disk. | ||
* If {@link #isDirty()} returns true, we don't know if state is written to disk. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better for these docs to be on the isDirty()
method.
@@ -103,6 +103,7 @@ public void testReadWriteState() throws IOException { | |||
final long id = addDummyFiles("foo-", dirs); | |||
Format format = new Format("foo-"); | |||
DummyState state = new DummyState(randomRealisticUnicodeOfCodepointLengthBetween(1, 1000), randomInt(), randomLong(), randomDouble(), randomBoolean()); | |||
int version = between(0, Integer.MAX_VALUE/2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems unused.
@@ -116,6 +117,7 @@ public void testReadWriteState() throws IOException { | |||
DummyState read = format.read(NamedXContentRegistry.EMPTY, list[0]); | |||
assertThat(read, equalTo(state)); | |||
} | |||
final int version2 = between(version, Integer.MAX_VALUE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems unused.
@@ -143,6 +145,7 @@ public void testVersionMismatch() throws IOException { | |||
|
|||
Format format = new Format("foo-"); | |||
DummyState state = new DummyState(randomRealisticUnicodeOfCodepointLengthBetween(1, 1000), randomInt(), randomLong(), randomDouble(), randomBoolean()); | |||
int version = between(0, Integer.MAX_VALUE/2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems unused.
@@ -166,6 +169,7 @@ public void testCorruption() throws IOException { | |||
final long id = addDummyFiles("foo-", dirs); | |||
Format format = new Format("foo-"); | |||
DummyState state = new DummyState(randomRealisticUnicodeOfCodepointLengthBetween(1, 1000), randomInt(), randomLong(), randomDouble(), randomBoolean()); | |||
int version = between(0, Integer.MAX_VALUE/2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems unused.
try { | ||
firstStateDirectory.rename(tmpFileName, fileName); | ||
} catch (IOException e) { | ||
throw new WriteStateException(false, "failed to rename tmp file to final name in the first state location", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be useful to see the filenames in the exception message.
try { | ||
extraStateDirectory.rename(tmpFileName, fileName); | ||
} catch (IOException e) { | ||
throw new WriteStateException(true, "failed to rename tmp file to final name in extra state location", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be useful to see the filenames in the exception message.
try { | ||
stateDirectories.get(i).v2().syncMetaData(); | ||
} catch (IOException e) { | ||
throw new WriteStateException(true, "meta data directory fsync has failed", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be useful to see the path in the exception message.
} | ||
return extraStateDir; | ||
} catch (Exception e) { | ||
throw new WriteStateException(false, "failed to copy tmp state file to extra location", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be useful to see the filenames in the exception message.
} | ||
return stateDir; | ||
} catch (Exception e) { | ||
throw new WriteStateException(false, "failed to write state to the first location tmp file", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be useful to see the filenames in the exception message.
@DaveCTurner I'm done with the changes, could you please make another round? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I left one optional nit.
} | ||
} | ||
|
||
private static void performDirectoryCleanup(Path stateLocation, Directory stateDir, String tmpFileName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is short and only used in one place, so I think I'd inline it.
run gradle build tests please. Transient download failure |
@elasticmachine run the gradle build tests please. (either another transient download failure or else it didn't hear the first time) |
Currently if MetaDataStateFormat.write throws an IOExceptions if there was some problem with persisting state to disk. If exception is thrown, loadLatestState may read either old state or new state. This is not enough for Zen2 algorithm. In case of failure, we need to distinguish between 2 cases: storage is left in clean state or storage is left in dirty state.
If storage is left in clean state, loadLatestState may read only old state. If storage is left in dirty state, loadLatestState may read either old or new state.
If an exception occurs when writing manifest file to disk this distinction is important for Zen2. If storage is clean, node can continue to be a part of the cluster and may try to accept further cluster state updates (if it fails to accept cluster state updates it will be kicked off from the cluster using different mechanism). But if storage is dirty, node should be restarted and it will be able to startup successfully only once it successfully re-writes manifest file to disl.
This PR changes MetaDataStateFormat.write signature, replacing IOException with WriteStateException, which “isDirty” method could be used to distinguish between 2 failure cases.
We need to minimise number of failures, that leave storage in dirty state. That’s why this PR changes algorithm that is used to store state to disk. It has the following layout:
If an exception occurs in steps 1-3, storage is clearly in the clean state. If an exception occurs in step 5, storage is clearly in dirty state. Exception in step 4 is questionable, there are 2 options:
This PR prefers 1st approach and MetaDataTestFormatTests.testFailRandomlyAndReadAnyState tests for disk failures. But we could easily switch to option 2 if requested by reviewers.