-
Notifications
You must be signed in to change notification settings - Fork 25.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Zen2] Change MetaDataStateFormat write semantics (#34709)
Currently, if MetaDataStateFormat.write throws an IOExceptions if there was some problem with persisting state to disk. If an exception is thrown, loadLatestState may read either old state or new state. This is not enough for the Zen2 algorithm. In case of failure, we need to distinguish between 2 cases: storage is left in clean state or storage is left in a dirty state. If storage is left in the clean state, loadLatestState may read only old state. If storage is left in a dirty state, loadLatestState may read either old or new state. If an exception occurs when writing the manifest file to disk this distinction is important for Zen2. If storage is clean, the node can continue to be a part of the cluster and may try to accept further cluster state updates (if it fails to accept cluster state updates it will be kicked off from the cluster using different mechanism). But if storage is dirty, the node should be restarted and it will be able to start up successfully only once it successfully re-writes manifest file to disk. This commit changes MetaDataStateFormat.write signature, replacing IOException with WriteStateException, which “isDirty” method could be used to distinguish between 2 failure cases. We need to minimise the number of failures, that leave storage in a dirty state. That’s why this PR changes the algorithm that is used to store state to disk. It has the following layout: 1. For the first state location, create and fsync tmp file with state content. 2. For each extra location, copy and fsync tmp file with state content. 2. Atomically rename tmp file in the first location. 3. For each extra location, atomically rename tmp file. 4. For each location, fsync state directory. 5. Perform cleanup of old files, ignoring exceptions. If an exception occurs in steps 1-3, storage is clearly in the clean state. If an exception occurs in step 5, storage is clearly in dirty state. Exception in step 4 is questionable, there are 2 options: 1. Consider it as a failure. If the first disk fails, state disappears. So this is a failure and storage is in a dirty state. 2. Do not consider it as failure at all, ignore disk failures. This commit prefers 1st approach and MetaDataTestFormatTests.testFailRandomlyAndReadAnyState tests for disk failures.
- Loading branch information
1 parent
6d6ac74
commit 7a3cd10
Showing
3 changed files
with
156 additions
and
43 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
41 changes: 41 additions & 0 deletions
41
server/src/main/java/org/elasticsearch/gateway/WriteStateException.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
/* | ||
* Licensed to Elasticsearch under one or more contributor | ||
* license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright | ||
* ownership. Elasticsearch licenses this file to you under | ||
* the Apache License, Version 2.0 (the "License"); you may | ||
* not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
package org.elasticsearch.gateway; | ||
|
||
import java.io.IOException; | ||
|
||
/** | ||
* This exception is thrown when there is a problem of writing state to disk. | ||
*/ | ||
public class WriteStateException extends IOException { | ||
private boolean dirty; | ||
|
||
public WriteStateException(boolean dirty, String message, Exception cause) { | ||
super(message, cause); | ||
this.dirty = dirty; | ||
} | ||
|
||
/** | ||
* If this method returns false, state is guaranteed to be not written to disk. | ||
* If this method returns true, we don't know if state is written to disk. | ||
*/ | ||
public boolean isDirty() { | ||
return dirty; | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters