-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: improve information on CancellationException for LROs #2236
Conversation
@@ -77,6 +87,13 @@ public boolean shouldRetry(TimedAttemptSettings nextAttemptSettings) | |||
if (super.shouldRetry(nextAttemptSettings)) { | |||
return true; | |||
} | |||
if (LOGGER.isLoggable(Level.SEVERE)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ, is there a reason SEVERE
logging level was picked? It seems like that is meant for more fatal application issues (and less for an RPC timeout), but I also am not sure if there is a better level (Info? Warning?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. My reasoning was more like "exception=severe", but in the context of a timeout on a request I agree something like WARNING
is better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decreased it to WARNING
|
||
/** | ||
* Operation timed polling algorithm, which uses exponential backoff factor for determining when the | ||
* next polling operation should be executed. If the polling exceeds the total timeout this | ||
* algorithm cancels polling. | ||
*/ | ||
public class OperationTimedPollAlgorithm extends ExponentialRetryAlgorithm { | ||
|
||
@VisibleForTesting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't think the logger needs a VisibleForTesting annotation.
@@ -121,6 +124,8 @@ public void testHttpJson_LROSuccessfulResponse_doesNotExceedTotalTimeout() throw | |||
@Test | |||
public void testGRPC_LROUnsuccessfulResponse_exceedsTotalTimeout_throwsDeadlineExceededException() | |||
throws Exception { | |||
FakeLogHandler logHandler = new FakeLogHandler(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Might be better to create field for FakeLogHandler that all tests can use without having to initialize in multiple tests, if possible
setUp() -> Creates it
clean() -> clears all the message so the next test can run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, thanks! I only added a setUp()
since creating a new handler wipes out the old handler with its maessages
FakeLogHandler logHandler = new FakeLogHandler(); | ||
OperationTimedPollAlgorithm.LOGGER.addHandler(logHandler); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, added a few nits. Can you also have @burkedavison or @JoeWang1127 take a quick look? Thanks for this @diegomarquezp
|
||
/** | ||
* Operation timed polling algorithm, which uses exponential backoff factor for determining when the | ||
* next polling operation should be executed. If the polling exceeds the total timeout this | ||
* algorithm cancels polling. | ||
*/ | ||
public class OperationTimedPollAlgorithm extends ExponentialRetryAlgorithm { | ||
|
||
public static Logger LOGGER = Logger.getLogger(OperationTimedPollAlgorithm.class.getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the logger has to be public
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so it can be accessed by the test in order to add a log handler. @lqiu96, maybe we can fall back to @VisibleForTesting
because it is... visible only for testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right. Let's try the the default package scope, which I believe should still be accessible. If not, then we can add the annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There're some suggestions made by Quality Gate: https://sonarcloud.io/project/issues?id=googleapis_gapic-generator-java&pullRequest=2236&resolved=false&types=CODE_SMELL
@@ -149,6 +161,9 @@ public void testGRPC_LROUnsuccessfulResponse_exceedsTotalTimeout_throwsDeadlineE | |||
assertThrows(CancellationException.class, operationFuture::get); | |||
int attemptCount = operationFuture.getPollingFuture().getAttemptSettings().getAttemptCount(); | |||
assertThat(attemptCount).isGreaterThan(1); | |||
assertEquals(1, logHandler.getAllMessages().size()); | |||
assertThat(logHandler.getAllMessages().get(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a guarantee that the message is the first element of the list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far the implementation has only one case where the logger is used. I think the current assertions are a safe heuristic. Maybe I can assert that at least one message contains the entry I'm looking for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I can assert that at least one message contains the entry I'm looking for?
I think this maybe better.
gax-java/gax/src/test/java/com/google/api/gax/util/FakeLogHandler.java
Outdated
Show resolved
Hide resolved
I don't have a lot of context of this change so I provide some general suggestions. |
@@ -39,6 +43,13 @@ | |||
*/ | |||
public class ITLongRunningOperation { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure that this already-complex integration test the best place to add these new assertions? Rephrased: Is the showcase server needed to verify the client's logging behavior?
The changes we're making are in com.google.api.gax.longrunning.OperationTimedPollAlgorithm
, but we don't yet have any unit tests for this class. Is it possible for us to write a direct unit test that focuses purely on the logging behavior of OperationTimedPollAlgorithm.shouldRetry()
?
Without looking at this PR and only the logic we're trying to test, I would have expected to see a set of unit tests that were roughly:
// WARNING: This is all completely pseudocode...
public void testTroubleshootingLinkProvided() {
LOGGER.addHandler(handler);
assertThrows(..., algorithm.shouldRetry(NEXT_ATTEMPT_SHOULD_NOT_RETRY));
assertThat(handler.getLogMessage().exists())
assertThat(handler.getLogMessage().contains(LRO_TROUBLESHOOTING_LINK));
}
public void testTroubleshootingLinkNotProvided() {
LOGGER.addHandler(handler);
assertThat(algorithm.shouldRetry(NEXT_ATTEMPT_SHOULD_RETRY), is(true));
assertThat(handler.getLogMessage().doesNotExist())
}
Then the integration test can be kept clear of client-side-only implementation details, allowing the person debugging a problem to focus only on the relevant interactions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@burkedavison thank you. At some point writing the test in this file helped me find that the exception thrown in OperationTimedPollAlgorithm
does not bubble up to the client (hence the use of a logger), but maybe that's just part of finding out about the behavior of LROs. I will move the test case to a unit test in gax
FakeApiClock clock = new FakeApiClock(System.nanoTime()); | ||
OperationTimedPollAlgorithm algorithm = | ||
OperationTimedPollAlgorithm.create(FAST_RETRY_SETTINGS, clock); | ||
TimedAttemptSettings settings = | ||
TimedAttemptSettings.newBuilder() | ||
.setGlobalSettings(FAST_RETRY_SETTINGS) | ||
.setRetryDelay(Duration.ofMillis(1l)) | ||
.setRpcTimeout(Duration.ofMillis(1l)) | ||
.setRandomizedRetryDelay(Duration.ofMillis(1l)) | ||
.setAttemptCount(0) | ||
.setFirstAttemptStartTimeNanos(clock.nanoTime()) | ||
.build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the improvement by moving this to a unit test!
At this point, I'm just nit-picking: you could choose to extract all three of these variables to class fields, and further reduce both of these tests to their essential checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
.setAttemptCount(0) | ||
.setFirstAttemptStartTimeNanos(clock.nanoTime()) | ||
.build(); | ||
clock.incrementNanoTime(1 * 1000 * 1000 * 1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider a quick comment here to clarify what part of the logic this is triggering to force a no-retry.
For example (if this is accurate):
clock.incrementNanoTime(1 * 1000 * 1000 * 1000); // Force Rpc Timeout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. I added this comment
private static TimedAttemptSettings timedAttemptSettings; | ||
private static FakeApiClock clock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By making these static and initializing them in an instance method (@Before
), the test now becomes flaky if multithreading is ever enabled. One test may overwrite the reference during a different test.
private static TimedAttemptSettings timedAttemptSettings; | |
private static FakeApiClock clock; | |
private TimedAttemptSettings timedAttemptSettings; | |
private FakeApiClock clock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, good catch. thanks. I didn't think through the need for static in this test class. I just switched them as instance variables
edit: I didn't notice this comment had a suggestion but I added the same change in another commit
[gapic-generator-java-root] Kudos, SonarCloud Quality Gate passed! |
[java_showcase_integration_tests] Kudos, SonarCloud Quality Gate passed! |
Fixes #1858
When a LRO runs out of retries/time, it throws a
CancellationException
that is then swallowed by guava'sAbstractFuture
, meaning there is no simple way to bubble up a custom exception with message.This approach uses a logger with a message that points to LRO documentation to ensure the user is informed on what they can do about this.