Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding retry count to exceptions #5789

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

RanVaknin
Copy link

@RanVaknin RanVaknin commented Jan 10, 2025

Motivation and Context

Improve debugging visibility by making attempt count immediately visible in exception messages. Currently, attempt information is only available through suppressed exceptions, requiring additional code to access. This change makes attempts visible directly in the exception message, matching behavior in other AWS SDKs.

Modifications

  • Added attempt count tracking and message formatting in SdkException
  • Modified AwsServiceException to include attempts in AWS error details
  • Modified RetryableStageHelper2 to track and propagate attempt counts during retries

Testing

  • Added new functional test ExceptionAttemptMessageBehaviorTest testing with different retry scenarios
  • Modified existing tests in AwsServiceExceptionTest and SdkExceptionSerializationTest to verify attempt count behavior
  • Added test cases in SdkExceptionMessageTest for new attempt count functionality

Visualization of changes:

Before change:

// non-retryable scenario
try {
    // request to a non-existent bucket
    s3Client.listObjects(g -> g.bucket("some-non-existent-bucket"));
} catch (AwsServiceException e) {
    System.out.println(e);
}
// outputs:
// NoSuchBucketException: The specified bucket does not exist (Service: S3, Status Code: 404, Request ID: abc123, Extended Request ID: xyzdef123456)


// retryable scenario
try {
    // request that might cause a retryable error (e.g., throttling)
    dynamoDbClient.getItem(...);
} catch (AwsServiceException e) {
    System.out.println(e);
}
// outputs:
// ThrottlingException: Rate of requests exceeds the allowed throughput (Service: DynamoDB, Status Code: 429, Request ID: abc123, Extended Request ID: xyzdef123456)
// Note: attempts only visible through e.getSuppressed()

After change:

// non-retryable scenario
try {
    // request to a non-existent bucket
    s3Client.listObjects(g -> g.bucket("some-non-existent-bucket"));
} catch (AwsServiceException e) {
    System.out.println(e);
}
// outputs:
// NoSuchBucketException: The specified bucket does not exist (Service: S3, Status Code: 404, Request ID: abc123, Extended Request ID: xyzdef123456) (Attempts: 1)


// retryable scenario
try {
    // request that might cause a retryable error (e.g., throttling)
    dynamoDbClient.getItem(...);
} catch (AwsServiceException e) {
    System.out.println(e);
}
// outputs:
// ThrottlingException: Rate of requests exceeds the allowed throughput (Service: DynamoDB, Status Code: 429, Request ID: abc123, Extended Request ID: xyzdef123456) (Attempts: 4)
// Note: attempts now visible directly in the exception message

Potential Impact:

Test or error handling code that relies on exact exception message matching might break as messages now include attempt count. We recommend using partial message matching for more resilient tests and error handling.

// may break
assertEquals("Resource not found", e.getMessage());
if (e.getMessage().equals("Resource not found")) { ... }

// more resilient approach
assertTrue(e.getMessage().contains("Resource not found"));
try {
    client.someOperation();
} catch (AwsServiceException e) {
	// may break
    if (e.getMessage().equals("Resource not found")) {
        handleNotFound();
    }
	// more resilient
    if (e.getMessage().contains("Resource not found")) {
        handleNotFound();
    }
}

@RanVaknin RanVaknin requested a review from a team as a code owner January 10, 2025 18:39
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
31.1% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

this.attempts = builder.attemptCount();
}

public int getAttempts() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a fluent method name like attemptCount() without the get prefix?

Using attempts might give the impression that the return value is a list of attempts. In SDK Java 2.x, we typically use plural names to represent lists.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggesting numAttempts to be consistent with numXXX used elsewhere, eg: numRetries

if (message == null) {
message = awsErrorDetails().errorMessage();
}
if (message == null) {
return details.toString();
}
return message + " " + details;
StringBuilder formattedMessage = new StringBuilder(message);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment stating that appending the attempt information after the details is done to preserve the order, ensuring that AWS error details are printed after the AWS server exception message?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use StringJoiner instead?

AwsServiceException exception = assertThrows(AwsServiceException.class,
() -> callAllTypes(client));

assertThat(exception.getMessage()).contains("(Attempts: 1");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we "(Attempts: 1)" since this test case will also pass if we out put as "(Attempts: 10" or "(Attempts: 1__SPACE_" etc

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! Ill fix that.

@@ -28,9 +28,23 @@
public class SdkException extends RuntimeException {

private static final long serialVersionUID = 1L;
private int attempts;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use Integer to differentiate between null and 0

if (message == null) {
message = awsErrorDetails().errorMessage();
}
if (message == null) {
return details.toString();
}
return message + " " + details;
StringBuilder formattedMessage = new StringBuilder(message);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use StringJoiner instead?

this.attempts = attempts;
}

public String getRawMessage() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason we are adding a new public method? More public APIs means more maintenance and less flexibility to update it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this method is used in AwsServiceException as a template method maintain the order of the exception message in the desired format of <message> (<metadata>) (<attempts>)

Because its defined in sdk-core but used in aws-core it has to be made public.

It was introduced to avoid this:

Expected :"errorMessage (Service: serviceName, Status Code: 500, Request ID: requestId) (Attempts: 6)"
Actual   :"errorMessage (Attempts: 6) (Service: serviceName, Status Code: 500, Request ID: requestId)"

this.attempts = builder.attemptCount();
}

public int getAttempts() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggesting numAttempts to be consistent with numXXX used elsewhere, eg: numRetries

Comment on lines +68 to +72
if (attempts > 0) {
StringBuilder formattedMessage = new StringBuilder();
formattedMessage.append(message).append(" ").append("(Attempts: ").append(attempts).append(")");
return formattedMessage.toString();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be identical to the code in AwsServiceException, can we reuse it?

lastException.addSuppressed(pastException);
}
lastException.setAttempts(retriesAttemptedSoFar() + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably use toBuilder().numAttempts(x).build()

Copy link
Author

@RanVaknin RanVaknin Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to do that initially, but toBuilder will cause the re-consturction of the exception and will result in a duplication of the attempt count (Attempt: N) (Attempt N).

Discussed this with @millems and @dagnir and since we are already mutating the exception they thought offering a setter is reasonable.

Comment on lines +52 to +59
public void defaultAttemptCount_shouldReturnOne() {
assertThat(SdkException.builder().build().getAttempts()).isEqualTo(0);
}

@Test
public void getAttempts_WithExplicitAttemptCount_ReturnsOneBased() {
assertThat(SdkException.builder().message("foo").attemptCount(2).build().getAttempts()).isEqualTo(2);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using ParameterizedTest if we are verifying the same thing with different values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants