Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Bulk Deletes for GCS Repository #41368

Merged
merged 12 commits into from
Apr 30, 2019

Conversation

original-brownbear
Copy link
Member

  • Just like Add Bulk Delete Api to BlobStore #40322 for AWS
  • We already had a bulk delete API but weren't using it from the blob container implementation, now we are using it
    • Made the bulk delete API also compliant with our interface that only suppresses errors about non-existent blobs by stating failed deletes (I didn't use any bulk stat action here since having to stat here should be the exception anyway and it would make error handling a lot more complex)
  • Fixed bulk delete API to limit its batch size to 100 in line with GCS recommendations

* Just like elastic#40322 for AWS
* We already had a bulk delete API but weren't using it from the blob container implementation, now we are using it
  * Made the bulk delete API also compliant with our interface that only suppresses errors about non existent blobs by stating failed deletes (I didn't use any bulk stat action here since having to stat here should be the exception anyway and it would make error handling a lot more complex)
* Fixed bulk delete API to limit its batch size to 100 in line with GCS recommendations
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICS, Google's SDK already implements the max batch size (see HttpStorageRpc.DefaultRpcBatch) so no need to manually implement it.

@original-brownbear
Copy link
Member Author

@ywelsch 🤦‍♂️ right ... obviously I only looked through their code to find the batch size and apparently stopped thinking at that point :)
=> batching removed in b971752 :)

Copy link
Contributor

@andrershov andrershov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@original-brownbear
Copy link
Member Author

@ywelsch you're good with this one too? :)

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICS, what we would like to implement is the deleteBlobsIgnoringIfNotExists method. The behavior of List<Boolean> delete(Iterable<BlobId> blobIds) method in the Storage class of Google's SDK equate's failure and missing file:

@return an immutable list of booleans. If a blob has been deleted the corresponding item in the
   *     list is {@code true}. If a blob was not found, deletion failed or access to the resource
   *     was denied the corresponding item is {@code false}.

We want the distinction between the two, and not throw an exception if the file is just missing. This can be achieved by using the lower-level batch() method, building the batch yourself and then plugging in a different error handling mapping.

We should also have tests that check that this behavior is correctly implemented by any deleteBlobsIgnoringIfNotExists method.

On a related note, we should delete the "delete" method from the BlobStore interface, and replace it's single use by a blobcontainer.

@original-brownbear
Copy link
Member Author

@ywelsch two 🤦‍♂️ in one PR from me :( Obviously much better to save the get calls and just use the proper callbacks.
Did that now in a029e7b :) (had to do some nasty mocking because the related classes had package private constructors in the GCloud SDK but other than that, this should work fine).

@original-brownbear
Copy link
Member Author

Jenkins run elasticsearch-ci/1

1 similar comment
@original-brownbear
Copy link
Member Author

Jenkins run elasticsearch-ci/1

@original-brownbear
Copy link
Member Author

@ywelsch

On a related note, we should delete the "delete" method from the BlobStore interface, and replace it's single use by a blobcontainer.

#41619 :)

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also have tests that check that this behavior is correctly implemented by any deleteBlobsIgnoringIfNotExists method.

Can you also address that one? In particular, I'm thinking about ESBlobStoreContainerTestCase

protected String buildKey(String blobName) {
@Override
public void deleteBlobsIgnoringIfNotExists(List<String> blobNames) throws IOException {
blobStore.deleteBlobs(blobNames.stream().map(this::buildKey).collect(Collectors.toList()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the method on blobstore should now also be renamed to deleteBlobsIgnoringIfNotExists

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


@Override
public void error(StorageException exception) {
if (exception.getCode() == 404) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use the constant here: HttpURLConnection.HTTP_NOT_FOUND

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure :)

if (exception.getCode() == 404) {
results.add(Boolean.TRUE);
} else {
results.add(Boolean.FALSE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to have this consistent with the S3 implementation, which currently collects the exceptions and add them as suppressed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

failed = true;
}
}
if (failed) {
throw new IOException("Failed to delete all [" + blobIdsToDelete.size() + "] blobs");
throw new IOException("Failed to delete the following blobs " + blobIdsToDelete);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps just "Failed to delete blobs " + blobIdsToDelete

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

callback.success(true);
return null;
}).when(result).notify(any(BatchResult.Callback.class));
when(batch.delete(any(BlobId.class), anyVararg())).thenAnswer(invocation -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we somehow check that no other method is called on this mock?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jup done :)

return ioe.get();
});
if (e != null) {
throw e;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this now throws the StorageException, and not an IOException like S3's BlobContainer. Let's use one pattern (the one we had for S3), not two different ones.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, fixed in 1d8d9c0 :)

@original-brownbear
Copy link
Member Author

Jenkins run elasticsearch-ci/packaging-sample

return ioe.get();
});
if (e != null) {
throw new IOException(e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add a message just as for S3BlobContainer? i.e. "Exception when deleting blobs [" + blobNames + "]"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consciously didn't do that here, since it's a little different from the AWS case in that we will always submit the full batch of deletes since the sub-batches are internal to the client (unlike in S3 where we do the splitting up of the batch).
So I expect us to always get an exception for every failed blob, making listing them again a little redundant (plus we're catching these exceptions upstream anyway and logging the blobs that we tried to delete)?
Maybe rather remove the listing from S3 as well? (I didn't realize it at the time, but it seems completely redundant when we always log the list of blobs upstream anyway)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be nicer to collect the files that failed deletion here than at the caller site. It allows filtering the list down to the actual file deletions that failed (i.e. instead of just blobNames, we can filter it down to those that actually experienced a failure). A similar thing can be done for S3.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Maybe keep it all in the exception though and simply make sure that the blob names are all in the exception?
I don't like manually collecting the list of failed deletes without exceptions tbh., it doesn't really give us any information. We want to know why a delete failed. The fact that it failed we can see by listing the stale blobs later on already :P

Then we could simply remove the blobs list from all the upstream logging and be done?
=> how about keeping this like they are here and adjusting the S3 implementation and upstream logging in a subsequent PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok to adapt call sites in a follow-up PR, but let's add the message with all the blob names at least in this PR here.

@original-brownbear
Copy link
Member Author

original-brownbear commented Apr 30, 2019

thanks @ywelsch, @andrershov !

@original-brownbear original-brownbear merged commit b633182 into elastic:master Apr 30, 2019
@original-brownbear original-brownbear deleted the gcs-batch-delete branch April 30, 2019 10:38
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Apr 30, 2019
* Implement Bulk Deletes for GCS Repository

* Just like elastic#40322 for AWS
* We already had a bulk delete API but weren't using it from the blob container implementation, now we are using it
  * Made the bulk delete API also compliant with our interface that only suppresses errors about non existent blobs by stating failed deletes (I didn't use any bulk stat action here since having to stat here should be the exception anyway and it would make error handling a lot more complex)
* Fixed bulk delete API to limit its batch size to 100 in line with GCS recommendations
original-brownbear added a commit that referenced this pull request Apr 30, 2019
* Implement Bulk Deletes for GCS Repository (#41368)

* Just like #40322 for AWS
* We already had a bulk delete API but weren't using it from the blob container implementation, now we are using it
  * Made the bulk delete API also compliant with our interface that only suppresses errors about non existent blobs by stating failed deletes (I didn't use any bulk stat action here since having to stat here should be the exception anyway and it would make error handling a lot more complex)
* Fixed bulk delete API to limit its batch size to 100 in line with GCS recommendations

back port of #41368
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Apr 30, 2019
* Follow up to elastic#41368
* Collect all failed blob deletes and add them to the exception message
* Remove logging of blob name list from caller exception logging
akhil10x5 pushed a commit to akhil10x5/elasticsearch that referenced this pull request May 2, 2019
* Implement Bulk Deletes for GCS Repository

* Just like elastic#40322 for AWS
* We already had a bulk delete API but weren't using it from the blob container implementation, now we are using it
  * Made the bulk delete API also compliant with our interface that only suppresses errors about non existent blobs by stating failed deletes (I didn't use any bulk stat action here since having to stat here should be the exception anyway and it would make error handling a lot more complex)
* Fixed bulk delete API to limit its batch size to 100 in line with GCS recommendations
original-brownbear added a commit that referenced this pull request May 6, 2019
* Cleanup Bulk Delete Exception Logging

* Follow up to #41368
* Collect all failed blob deletes and add them to the exception message
* Remove logging of blob name list from caller exception logging
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
* Implement Bulk Deletes for GCS Repository

* Just like elastic#40322 for AWS
* We already had a bulk delete API but weren't using it from the blob container implementation, now we are using it
  * Made the bulk delete API also compliant with our interface that only suppresses errors about non existent blobs by stating failed deletes (I didn't use any bulk stat action here since having to stat here should be the exception anyway and it would make error handling a lot more complex)
* Fixed bulk delete API to limit its batch size to 100 in line with GCS recommendations
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
* Cleanup Bulk Delete Exception Logging

* Follow up to elastic#41368
* Collect all failed blob deletes and add them to the exception message
* Remove logging of blob name list from caller exception logging
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request May 28, 2019
* Cleanup Bulk Delete Exception Logging

* Follow up to elastic#41368
* Collect all failed blob deletes and add them to the exception message
* Remove logging of blob name list from caller exception logging
original-brownbear added a commit that referenced this pull request May 28, 2019
* Cleanup Bulk Delete Exception Logging

* Follow up to #41368
* Collect all failed blob deletes and add them to the exception message
* Remove logging of blob name list from caller exception logging
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants