Add blob container retries tests for Google Cloud Storage #46968

tlrx · 2019-09-23T10:08:17Z

Similarly to what has been done for S3 in #45383, this commit adds unit tests that verify the behavior of the SDK client and blob container implementation for Google Storage when the remote service returns errors.

The main purpose was to add an extra test to the specific retry logic for 410-Gone errors added in #45963 but since I was there I also added tests for other read/write methods.

Relates #45963

elasticmachine · 2019-09-23T10:08:19Z

Pinging @elastic/es-distributed

original-brownbear

Thanks @tlrx :) Just a few random NITs and the timeout issue I commented on (I think we should do something here ... or maybe just use longer timeouts since it's a rarely() thing). Let me know what you think there :)

...sitory-gcs/src/main/java/org/elasticsearch/repositories/gcs/GoogleCloudStorageBlobStore.java

...est/java/org/elasticsearch/repositories/gcs/GoogleCloudStorageBlobContainerRetriesTests.java

original-brownbear · 2019-09-23T12:16:12Z

...est/java/org/elasticsearch/repositories/gcs/GoogleCloudStorageBlobContainerRetriesTests.java

+
+    public void testWriteLargeBlob() throws IOException {
+        final boolean useTimeout = rarely();
+        final TimeValue readTimeout = useTimeout ? TimeValue.timeValueMillis(randomIntBetween(100, 500)) : null;


This has me a little worried stability wise. It seems all it takes for this test to fail is some GC pause with unlucky timing?
Can we harden the test against this scenaro somehow like we did in the S3 tests?

Thanks for this comment. I've think a bit to what you say and suggest and I agree this test can fail in case of GC pauses at the wrong time. It can also be quite hard to investigate because with low read timeout values a request timeout could be either caused by a GC pause or by the test itself.

To mitigate this, I've change the test to use a higher value for the read timeout client settings (I picked up 3s) and then only simulates read timeouts for the resumable upload session init and for the first chunk upload. This way we still test that read timeout work for the 2 types of resumable requests requests but we only fail 1 time for each.

This allows to use a higher read timeout value and keep the test under the 10-15 seconds execution time.

Please let me know what you think!

original-brownbear

LGTM :)

Let's try with the longer timeout. We both know what's going on here and can jump in if it turns out to be unstable. The chance of that might be very low in fact since the GC pause has to hit right on the physical read call (sort of since its async IO) so I'm optimistic :)

tlrx · 2019-09-24T06:57:49Z

Thanks @original-brownbear !

Similarly to what has been done for S3 in #45383, this commit adds unit tests that verify the behavior of the SDK client and blob container implementation for Google Storage when the remote service returns errors. The main purpose was to add an extra test to the specific retry logic for 410-Gone errors added in #45963. Relates #45963

Add blob container retries tests for Google Cloud Storage

0ecaea6

tlrx added >test Issues or PRs that are addressing/adding tests :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.5.0 labels Sep 23, 2019

tlrx requested a review from original-brownbear September 23, 2019 10:08

tlrx mentioned this pull request Sep 23, 2019

Retry GCS Resumable Upload on Error 410 #45963

Merged

original-brownbear reviewed Sep 23, 2019

View reviewed changes

Armin's feedback

f69d723

tlrx requested a review from original-brownbear September 23, 2019 14:27

original-brownbear approved these changes Sep 23, 2019

View reviewed changes

tlrx merged commit 6061912 into elastic:master Sep 24, 2019

tlrx deleted the add-retries-tests-for-gcs branch September 24, 2019 06:57

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add blob container retries tests for Google Cloud Storage #46968

Add blob container retries tests for Google Cloud Storage #46968

tlrx commented Sep 23, 2019

elasticmachine commented Sep 23, 2019

original-brownbear left a comment

original-brownbear Sep 23, 2019

tlrx Sep 23, 2019

original-brownbear left a comment

tlrx commented Sep 24, 2019

Add blob container retries tests for Google Cloud Storage #46968

Add blob container retries tests for Google Cloud Storage #46968

Conversation

tlrx commented Sep 23, 2019

elasticmachine commented Sep 23, 2019

original-brownbear left a comment

Choose a reason for hiding this comment

original-brownbear Sep 23, 2019

Choose a reason for hiding this comment

tlrx Sep 23, 2019

Choose a reason for hiding this comment

original-brownbear left a comment

Choose a reason for hiding this comment

tlrx commented Sep 24, 2019