-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async Snapshot Repository Deletes #40144
Async Snapshot Repository Deletes #40144
Conversation
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case where a snapshot deletion is running, it takes over the snapshot threadpool on the master, blocking "create snapshot" requests, which are also dispatched in TransportCreateSnapshotAction to the snapshot threadpool. I think we should have these requests come in on the generic threadpool and then go to the snapshot threadpool after actually checking on the cluster state update thread whether there is any ongoing deletes.
Also, while looking at SnapshotsService.deleteSnapshot
I've noticed that we have callers of that method that just swallow the returned exception instead of calling the listener, grep e.g. for deleteSnapshot(snapshot.getRepository(), snapshot.getSnapshotId().getName(), listener, true);
. Also worth fixing in a separate PR.
server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
Outdated
Show resolved
Hide resolved
} | ||
final AtomicInteger outstandingIndices = new AtomicInteger(indices.size()); | ||
for (String index : indices) { | ||
threadPool.executor(ThreadPool.Names.SNAPSHOT).execute(() -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ActionRunnable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can further parallelize this to the shard level I think (can be a follow-up)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
Outdated
Show resolved
Hide resolved
"but failed to clean up its index folder due to the directory not being empty.", metadata.name(), indexId), dnee); | ||
} catch (IOException ioe) { | ||
} catch (Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a conscious choice, I'd actually rather log all exceptions here directly + it simplifies things since we don't need to use the ActionRunnable here if we catch everything. I adjusted the logic a little though and invoked the grouped listener in a finally
to make things a little clearer (and avoid the potential odd block from a future assertion trippin somewhere).
server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
Outdated
Show resolved
Hide resolved
@ywelsch all points addressed, left ~2 questions though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that we need to switch TransportRestoreSnapshotAction to generic thread pool as well and then, once cluster state is checked, switch it to "snapshot" thread pool. @ywelsh do you agree?
server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
Show resolved
Hide resolved
listener.onFailure(new RepositoryException(metadata.name(), "failed to delete snapshot [" + snapshotId + "]", ex)); | ||
return; | ||
} | ||
deleteSnapshotBlobs(snapshot, snapshotId, repositoryData, updatedRepositoryData, listener); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can see, there are three steps: deleteSnapshotBlobs, deleteIndices, deleteUnreferencedIndices. It would be nice to see this function calls in this method. Something like this:
deleteSnapshotBlobs(...,
ActionListener.wrap(() -> deleteIndices(
ActionListener.wrap(() -> deleteUnreferencedIndices(..., listener))
)));
I'm not sure I undestand your comment about "used in multiple places"
@andrershov comments addressed :) |
Jenkins run elasticsearch-ci/bwc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@andrershov thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left some smaller questions, looking good otherwise.
server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
Outdated
Show resolved
Hide resolved
@@ -49,7 +49,7 @@ public TransportCreateSnapshotAction(TransportService transportService, ClusterS | |||
|
|||
@Override | |||
protected String executor() { | |||
return ThreadPool.Names.SNAPSHOT; | |||
return ThreadPool.Names.GENERIC; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment as to why we use generic instead of snapshot here? (same for TransportRestoreSnapshotAction). Otherwise, someone in the future might just set this back to snapshot.
@@ -161,6 +166,10 @@ | |||
|
|||
protected final RepositoryMetaData metadata; | |||
|
|||
protected final NamedXContentRegistry namedXContentRegistry; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this field is not used anywhere AFAICS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my bad ... already killed it in a previous PR but it snuck back in here from a merge mistake :)
blobContainer().deleteBlobsIgnoringIfNotExists( | ||
Arrays.asList(snapshotFormat.blobName(snapshotId.getUUID()), globalMetaDataFormat.blobName(snapshotId.getUUID()))); | ||
} catch (IOException e) { | ||
logger.warn(() -> new ParameterizedMessage("[{}] Unable to delete global metadata files", snapshotId), e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something we should not change in this PR, but I wonder if we should not be lenient here when it comes to IOException. What we're effectively doing here is creating the possibility to have a top-level entry that's referencing stuff that is deleted :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of, we have https://github.com/elastic/elasticsearch/pull/40144/files/36b2e3888b18745edf4bb4d6e4af4c5b3165f174#diff-83e9ae765eb2a80bbbedd251b686cc10R440 writing the latest global meta-data before this happens, so we don't have a chain of references from the root to these blobs anymore regardless. But I agree, this should be retried (but as I liked in my update, that's incoming shortly too :))
|
||
deleteIndexMetaDataBlobIgnoringErrors(snapshot, indexId); | ||
|
||
deleteIndexMetaDataBlobIgnoringErrors(snapshotId, indexId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, unrelated to the PR, but we could think in the future of moving this to a later phase, as deleting this here makes the rest unretriable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, that's incoming in master...original-brownbear:delete-lock-via-cs shortly :)
snapshotId, | ||
ActionListener.wrap(v -> { | ||
try { | ||
blobStore().blobContainer(basePath().add("indices")).deleteBlobsIgnoringIfNotExists( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have tests that check that the indices folders are cleaned up for FSRepository? Might be good to add some if we don't, so that we don't inadvertently break this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a few that check that the path is gone, but I'll also open a PR that has more comprehensive tests for the state of the repository from https://github.com/elastic/elasticsearch/compare/master...original-brownbear:resilient-deletes-test?expand=1 (50%ish done) very shortly :)
.map(info -> info.indices().stream().map(repositoryData::resolveIndexId).collect(Collectors.toList())) | ||
.orElse(Collections.emptyList()), | ||
snapshotId, | ||
ActionListener.wrap(v -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this not equivalent to the simpler ActionListener.map(listener, v -> { ... })
?
@ywelsch all points addressed now I think :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @andrershov and @ywelsch ! |
* elastic/master: (36 commits) Remove unneded cluster config from test (elastic#40856) Make Fuzziness reject illegal values earlier (elastic#33511) Remove test-only customisation from TransReplAct (elastic#40863) Fix dense/sparse vector limit documentation (elastic#40852) Make -try xlint warning disabled by default. (elastic#40833) Async Snapshot Repository Deletes (elastic#40144) Revert "Replace usages RandomizedTestingTask with built-in Gradle Test (elastic#40564)" Init global checkpoint after copy commit in peer recovery (elastic#40823) Replace usages RandomizedTestingTask with built-in Gradle Test (elastic#40564) [DOCS] Removed redundant (not quite right) information about upgrades. Remove string usages of old transport settings (elastic#40818) Rollup ignores time_zone on date histogram (elastic#40844) HLRC: fix uri encode bug when url path starts with '/' (elastic#34436) Mutes GatewayIndexStateIT.testRecoverBrokenIndexMetadata Docs: Pin two IDs in the rest client (elastic#40785) Adds version 6.7.2 [DOCS] Remind users to include @ symbol when applying license file (elastic#40688) HLRC: Convert xpack methods to client side objects (elastic#40705) Allow ILM to stop if indices have nonexistent policies (elastic#40820) Add an OpenID Connect authentication realm (elastic#40674) ...
Motivated by slow snapshot deletes reported in e.g. elastic#39656 and the fact that these likely are a contributing factor to repositories accumulating stale files over time when deletes fail to finish in time and are interrupted before they can complete. * Makes snapshot deletion async and parallelizes some steps of the delete process that can be safely run concurrently via the snapshot thread poll * I did not take the biggest potential speedup step here and parallelize the shard file deletion because that's probably better handled by moving to bulk deletes where possible (and can still be parallelized via the snapshot pool where it isn't). Also, I wanted to keep the size of the PR manageable. * See elastic#39656 (comment) * Also, as a side effect this gives the `SnapshotResiliencyTests` a little more coverage for master failover scenarios (since parallel access to a blob store repository during deletes is now possible since a delete isn't a single task anymore). * By adding a `ThreadPool` reference to the repository this also lays the groundwork to parallelizing shard snapshot uploads to improve the situation reported in elastic#39657
Motivated by slow snapshot deletes reported in e.g. #39656 and the fact that these likely are a contributing factor to repositories accumulating stale files over time when deletes fail to finish in time and are interrupted before they can complete. * Makes snapshot deletion async and parallelizes some steps of the delete process that can be safely run concurrently via the snapshot thread poll * I did not take the biggest potential speedup step here and parallelize the shard file deletion because that's probably better handled by moving to bulk deletes where possible (and can still be parallelized via the snapshot pool where it isn't). Also, I wanted to keep the size of the PR manageable. * See #39656 (comment) * Also, as a side effect this gives the `SnapshotResiliencyTests` a little more coverage for master failover scenarios (since parallel access to a blob store repository during deletes is now possible since a delete isn't a single task anymore). * By adding a `ThreadPool` reference to the repository this also lays the groundwork to parallelizing shard snapshot uploads to improve the situation reported in #39657
Motivated by slow snapshot deletes reported in e.g. elastic#39656 and the fact that these likely are a contributing factor to repositories accumulating stale files over time when deletes fail to finish in time and are interrupted before they can complete. * Makes snapshot deletion async and parallelizes some steps of the delete process that can be safely run concurrently via the snapshot thread poll * I did not take the biggest potential speedup step here and parallelize the shard file deletion because that's probably better handled by moving to bulk deletes where possible (and can still be parallelized via the snapshot pool where it isn't). Also, I wanted to keep the size of the PR manageable. * See elastic#39656 (comment) * Also, as a side effect this gives the `SnapshotResiliencyTests` a little more coverage for master failover scenarios (since parallel access to a blob store repository during deletes is now possible since a delete isn't a single task anymore). * By adding a `ThreadPool` reference to the repository this also lays the groundwork to parallelizing shard snapshot uploads to improve the situation reported in elastic#39657
Backport of elastic/elasticsearch#40144 and related elastic/elasticsearch#36140
Backport of elastic/elasticsearch#40144 and related elastic/elasticsearch#36140
Backport of elastic/elasticsearch#40144 and related elastic/elasticsearch#36140
Motivated by slow snapshot deletes reported in e.g. #39656 and the fact that these likely are a contributing factor to repositories accumulating stale files over time when deletes fail to finish in time and are interrupted before they can complete.
SnapshotResiliencyTests
a little more coverage for master failover scenarios (since parallel access to a blob store repository during deletes is now possible since a delete isn't a single task anymore).ThreadPool
reference to the repository this also lays the groundwork to parallelizing shard snapshot uploads to improve the situation reported in Introducing a new snapshot segments threadpool to uploads segments of shards in parallel #39657