Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3LayerDeleter cannot handle over 1000 objects to delete #3371

Closed
zacdezgeo opened this issue Apr 6, 2021 · 1 comment · Fixed by #3372 or #3378
Closed

S3LayerDeleter cannot handle over 1000 objects to delete #3371

zacdezgeo opened this issue Apr 6, 2021 · 1 comment · Fixed by #3372 or #3378
Assignees

Comments

@zacdezgeo
Copy link
Contributor

Describe the bug

I have been experiencing problems deleting large datasets in AWS S3. The errors were pretty cryptic, but I've pinpointed the problem to the S3LayerDeleter's call to s3Client.deleteObjects(deleteRequest).

The nested error is software.amazon.awssdk.services.s3.model.S3Exception: The XML you provided was not well-formed or did not validate against our published schema. I found similar issues for other projects and the root cause was sending a number of request to the AWS API exceeding the API's arbitrary limit. I found this link when searching for delete requests limits.

I've also tested the hypothesis, datasets containing less than 1000 objects can be deleted successfully while datasets containing over 1000 objects are not.

To Reproduce

  • Create an S3 dataset containing over 1000 objects
  • Fetch the LayerId of the dataset
  • Call the S3LayerDeleter.delete(id: LayerId)

The actual output is a cryptic error.

The expected output is void and the dataset is deleted.

Expected behavior

Dataset is deleted.

Environment

  • Java version: 1.8
  • Scala version: 2.12.10
  • GeoTrellis version: 3.5.2

Proposed Solution

The solution seems relatively straightforward and minimal and involves a single file change to S3LayerDeleter.scala. I believe it would be a matter of splitting the deleteRequest object into chunks and iteratively calling the s3Client.deleteObjects on those chunks.

@zacdezgeo
Copy link
Contributor Author

Hi @pomadchin,

Found a little bug related to this issue and fixed it. The call to the attribute store delete was throwing an error when there were more than 1000 images for a given zoom level. The call to delete the attributeStore LayerId resulted in an error on the second iteration. It has to be removed from the for each loop.

Sorry for not have catching that before closing the issue. Should we think of an integration/unit test?

@zacdezgeo zacdezgeo mentioned this issue Apr 9, 2021
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants