-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add blob container retries tests for Google Cloud Storage #46968
Conversation
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tlrx :) Just a few random NITs and the timeout issue I commented on (I think we should do something here ... or maybe just use longer timeouts since it's a rarely()
thing). Let me know what you think there :)
...sitory-gcs/src/main/java/org/elasticsearch/repositories/gcs/GoogleCloudStorageBlobStore.java
Outdated
Show resolved
Hide resolved
...est/java/org/elasticsearch/repositories/gcs/GoogleCloudStorageBlobContainerRetriesTests.java
Outdated
Show resolved
Hide resolved
...est/java/org/elasticsearch/repositories/gcs/GoogleCloudStorageBlobContainerRetriesTests.java
Outdated
Show resolved
Hide resolved
...est/java/org/elasticsearch/repositories/gcs/GoogleCloudStorageBlobContainerRetriesTests.java
Outdated
Show resolved
Hide resolved
|
||
public void testWriteLargeBlob() throws IOException { | ||
final boolean useTimeout = rarely(); | ||
final TimeValue readTimeout = useTimeout ? TimeValue.timeValueMillis(randomIntBetween(100, 500)) : null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has me a little worried stability wise. It seems all it takes for this test to fail is some GC pause with unlucky timing?
Can we harden the test against this scenaro somehow like we did in the S3 tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this comment. I've think a bit to what you say and suggest and I agree this test can fail in case of GC pauses at the wrong time. It can also be quite hard to investigate because with low read timeout values a request timeout could be either caused by a GC pause or by the test itself.
To mitigate this, I've change the test to use a higher value for the read timeout client settings (I picked up 3s) and then only simulates read timeouts for the resumable upload session init and for the first chunk upload. This way we still test that read timeout work for the 2 types of resumable requests requests but we only fail 1 time for each.
This allows to use a higher read timeout value and keep the test under the 10-15 seconds execution time.
Please let me know what you think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM :)
Let's try with the longer timeout. We both know what's going on here and can jump in if it turns out to be unstable. The chance of that might be very low in fact since the GC pause has to hit right on the physical read call (sort of since its async IO) so I'm optimistic :)
Thanks @original-brownbear ! |
Similarly to what has been done for S3 in #45383, this commit adds unit tests that verify the behavior of the SDK client and blob container implementation for Google Storage when the remote service returns errors. The main purpose was to add an extra test to the specific retry logic for 410-Gone errors added in #45963. Relates #45963
Similarly to what has been done for S3 in #45383, this commit adds unit tests that verify the behavior of the SDK client and blob container implementation for Google Storage when the remote service returns errors.
The main purpose was to add an extra test to the specific retry logic for
410-Gone
errors added in #45963 but since I was there I also added tests for other read/write methods.Relates #45963