-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Peons should not report SysMonitor stats since MiddleManager reports them. #12802
Peons should not report SysMonitor stats since MiddleManager reports them. #12802
Conversation
import org.mockito.ArgumentMatchers; | ||
import org.mockito.Mockito; | ||
|
||
public class SysMonitorTest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these tests are failing right now. is there a way to exclude these tests just for ARM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A separate annotation can be created to disable these tests on arm64-graviton2 architecture conditionally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how will that work exactly? can you describe in more detail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this?
@Target({ ElementType.TYPE, ElementType.METHOD })
@Retention(RetentionPolicy.RUNTIME)
@ExtendWith(DisableOnArm64Condition.class)
public @interface DisabledOnArm64 {
}
public class DisableOnArm64Condition implements ExecutionCondition {
@Override
public ConditionEvaluationResult evaluateExecutionCondition(ExtensionContext context) {
String osArch = System.getProperty("os.arch");
if(osArch.equalsIgnoreCase("aarch64")) {
return ConditionEvaluationResult.disabled("Test disabled on arm64 architecture");
} else {
return ConditionEvaluationResult.enabled("Test enabled");
}
}
}
Although @rohangarg pointed out to neat trick in SigarPidDiscovererTest
using Assume which would be lot simpler.
Map<String, String[]> dimensions = MonitorsConfig.mapOfDatasourceAndTaskID( | ||
dataSourceTaskIdHolder.getDataSource(), | ||
dataSourceTaskIdHolder.getTaskId() | ||
); | ||
return new SysMonitor(dimensions); | ||
return new SysMonitor(dimensions, isPeonRole(nodeRoles)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of leaking isPeon
into the SysMonitor - have you considered introducing a new class NoopSysMonitor
that does nothing. Then in this provider if the role is a Peon, return the noop class.
I think that would be a cleaner division of responsibilities and it would make it easier to test as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely agree although I was a little skeptical to use that if any extra functionality might be added in SysMonitor in the future that might want to be used on Peons. But that's a long shot. I'll go with NoopSysMonitor
. Thanks.
This is a great find and fix. Looks like CI is upset about coverage on the new code, but I will be a +1 once that is resolved |
@Test | ||
public void testDoMonitor() | ||
{ | ||
Assume.assumeFalse("aarch64".equals(CPU_ARCH)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think it should be safe for this test to run on all architectures, but I guess there's no harm in skipping this test on aarch64
Assume.assumeFalse("aarch64".equals(CPU_ARCH)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test fails on aarch64 since SysMonitor is the parent of NoopSysMonitor. Sigar still gets loaded even though there's no need for it. Or should there be another class that these two classes inherit from to avoid this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops I didn't notice that when I skimmed the change. The NoopSysMonitor should just implement Monitor since it is doing nothing. Or you can extend AbstractMonitor and return false in the doMonitor() method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But MetricsModule
is getting the instance of SysMonitor
through Guice while loading, and getSysMonitor()
provides for SysMonitor. In case of peon returning NoopSysMonitor here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok. feel free to ignore my suggestion above.
There is a way to refactor this so that sigar won't initialize on the peons, but it seems like overkill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of adding logic in getMonitorScheduler
during loading for the same and it seemed overkill. If you don't mind could you please share if there is other way for future references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we wanted to do something like that, I would have made SysMonitor an interface and make the current class an implementation of that interface.
Another way to approach this would be to add a new annotation and inject the Monitor interface that way https://github.com/google/guice/wiki/BindingAnnotations
I've retriggered what appears to have been flaky test failures. +1 from me, I'll leave it to other committers in this thread to approve and merge this change. Thanks for the contribution @tejaswini-imply ! |
Thanks @abhishekagarwal87, @suneet-s for the review. |
Description
Sysmonitor
stats (mem, fs, disk, net, cpu, swap, sys, tcp) are reported by all Druid processes, including Peons that are ephemeral in nature. Since Peons always run on the same host as the MiddleManager that spawned them and is unlikely to change, the SyMonitor metrics emitted by Peon are merely duplicates. This is often not a problem except when machines are super-beefy. Imagine a 64-core machine and 32 workers running on this machine. now you will have each Peon reporting metrics for each core. that's an increase of (32 * 64)x in the number of metrics. This leads to a metric explosion.This PR updates
MetricsModule
to check node role running while registeringSysMonitor
and not to load any existingSysMonitor$Stats
.Key changed/added classes in this PR
MetricsModule
SysMonitor
This PR has: