Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup BlobStoreRepository Abort and Failure Handling #46208

Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,7 @@
import java.util.Optional;
import java.util.Set;
import java.util.concurrent.Executor;
import java.util.concurrent.atomic.AtomicBoolean;
import java.util.stream.Collectors;

import static org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardSnapshot.FileInfo.canonicalName;
Expand Down Expand Up @@ -1048,17 +1049,27 @@ public void snapshotShard(Store store, MapperService mapperService, SnapshotId s
final GroupedActionListener<Void> filesListener =
new GroupedActionListener<>(allFilesUploadedListener, indexIncrementalFileCount);
final Executor executor = threadPool.executor(ThreadPool.Names.SNAPSHOT);
// Flag to signal that the snapshot has been aborted/failed so we can stop any further blob uploads from starting
final AtomicBoolean alreadyFailed = new AtomicBoolean();
original-brownbear marked this conversation as resolved.
Show resolved Hide resolved
for (BlobStoreIndexShardSnapshot.FileInfo snapshotFileInfo : filesToSnapshot) {
executor.execute(new ActionRunnable<>(filesListener) {
@Override
protected void doRun() {
try {
snapshotFile(snapshotFileInfo, indexId, shardId, snapshotId, snapshotStatus, store);
if (alreadyFailed.get() == false) {
snapshotFile(snapshotFileInfo, indexId, shardId, snapshotId, snapshotStatus, store);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call filesListener.onFailure() with a IndexShardSnapshotFailedException and an appropriate comment in case the upload is skipped because of the alreadyFailed flag?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do that, we get all those execeptions added to the overall exception again, precisely what I tried avoiding here. Note that you could theoretically have a pretty large number of files here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - then at least a trace log would be helpful?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I can't see what it would be useful for?
We'll get some indication that things were aborted anyway from the files that were actually aborted to set the flag and we will not update the blob index-N in the blob as a result of failing here as well -> seems like the information on what precise files were never uploaded is completely irrelevant?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

(Monday morning reviews, you know)

filesListener.onResponse(null);
} catch (IOException e) {
throw new IndexShardSnapshotFailedException(shardId, "Failed to perform snapshot (index files)", e);
}
}

@Override
public void onFailure(Exception e) {
alreadyFailed.set(true);
super.onFailure(e);
}
});
}
} catch (Exception e) {
Expand Down