-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
handles failure in compaction planner #4586
handles failure in compaction planner #4586
Conversation
Updates the CompactionJobGenerator to log failures in planning compactions. This avoids causing problems for the tablet management iterator when its trying to do other things like process tablet location updates. Added a test that ensures tables with flaky compaction planners can still be read and written. If failed compaction planning were to interfere with tablet assignment this would cause the reads and/or writes to fail.
@@ -162,23 +166,29 @@ public Map<String,String> getExecutionHints() { | |||
return dispatcher.dispatch(dispatchParams).getService(); | |||
} | |||
|
|||
private Level getLevel(TableId tableId, CompactionServiceId serviceId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious if you can use the EscalatingLogger ( or some other variant of ConditionalLogger) that's included in #4558 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can try that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use EscalatingLogger in 5500afb. Ran the new IT in this PR and looked at the logs and the changes are working. I was wondering if the exception some of the log messages had would be included in the key created from arguments in ConditionalLogger. Poked around a bit and found its not included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think the exception needs to be in the Key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if there is a downside to having too many ConditionalLoggers, as there is a Caffeine cache for each one. I wonder if it would be better to have one and add the logger name to the key.
@keith-turner - you also have PR #4435 and from what I can tell, it's for the same issue. If so, can you close one of them? |
I had forgot about that one, was just trying to work down the TODOs in the elasticity branch. Looking at the two PRs this one is better because it adds a test for an erroring planner plugin. It also consolidates a duplicate code pattern. Will close the other one. |
assertEquals(30, getFiles(client, "fail3").size()); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, if you wanted to test that the ERROR logging is being emitted (that the escalation is happening), you can look at https://github.com/apache/accumulo/blob/main/core/src/test/java/org/apache/accumulo/core/logging/EscalatingLoggerTest.java. I was able to figure out how to programatically change the logger and capture the output in a Java object that you can then test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I have wanted to check logging in unit test in the past but did not know how. Looked around and there is currently no unit test for CompactionJobGenerator. Would need to add one to test logging, could be something to look into as a follow on.
Updates the CompactionJobGenerator to log failures in planning compactions. This avoids causing problems for the tablet management iterator when its trying to do other things like process tablet location updates.
Added a test that ensures tables with flaky compaction planners can still be read and written. If failed compaction planning were to interfere with tablet assignment this would cause the reads and/or writes to fail.