Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Broker config druid.broker.segment.ignoredTiers #11766

Merged

Conversation

kfaraz
Copy link
Contributor

@kfaraz kfaraz commented Oct 4, 2021

Description

The new config is an extension of the concept of "watchedTiers" where
the Broker can choose to add the info of only the specified tiers to its timeline.
Similarly, with this config, Broker can choose to ignore the segments being served
by the specified historical tiers. By default, no tier is ignored.

This config is useful when you want a completely isolated tier amongst many other tiers.

Say there are several tiers of historicals Tier T1, Tier T2 ... Tier Tn
and there are several brokers Broker B1, Broker B2 .... Broker Bm

If we want only Broker B1 to query Tier T1, instead of setting a long list of watchedTiers
on each of the other Brokers B2 ... Bm, we could just set druid.broker.segment.ignoredTiers=["T1"]
for these Brokers, while Broker B1 could have druid.broker.segment.watchedTiers=["T1"]

Changes

  • Add new config druid.broker.segment.ignoredTiers
  • Filter servers in BrokerServerView based on whether the tiers are ignored or not
  • Add check that at most only one of ignoredTiers or watchedTiers is configured, fail otherwise

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@capistrant
Copy link
Contributor

what is desired behavior if an operator has Tier N Tn in both watchedTiers and ignoredTiers? From quick glance at PR, it seems that these segments would be ignored.

I could imagine an operator error configuring their broker causing some confusion if they included Tn in both lists. Would there be value in erroring out broker startup if this were to happen? Thus, forcing the operator to resolve the configuration conflict for the broker?

@kfaraz
Copy link
Contributor Author

kfaraz commented Oct 5, 2021

Thanks a lot for the review, @capistrant !

Please consider the alternatives below.

Alternative 1:
The configurations watchedTiers and ignoredTiers make most sense when the other is not specified at all. So at start-up, we could just verify that only one is non-null and fail the start-up. Otherwise, all the cases mentioned later here would add to the confusion.

I feel failing the broker start-up might be a little heavy-handed but the next alternative is more of a silent failure, which is worse 😅 .

Alternative 2:
We could give priority to one of the lists over the other (right now, ignoredTiers is the higher priority one. so, if a tier is present in both ignoredTiers and watchedTiers, that tier would be ignored) and spit out a WARN log and call this out in the docs*.

I do agree that this could create data inconsistencies for the user as their intent is not clear and would involve poring through the Broker logs to see what went wrong.

Possible Cases
*From a docs point of view, I feel there are several cases that would need clarification now:

ignoredTiers watchedTiers Behaviour
null (i.e. not specified in the config) null Watch ALL tiers
empty array null Watch ALL tiers
null empty array Watch NO tier
empty array empty array Watch NO tier
Tier A absent Tier A absent DO NOT watch Tier A
Tier A absent Tier A present Watch Tier A
Tier A present Tier A absent DO NOT watch Tier A
Tier A present Tier A present Watch Tier A?

With Alternative 1: We fail at start-up and don't need to clarify any of these scenarios (except null, null, which is the default setting)

With Alternative 2
From the above table, I feel that the tie-breaker seems to rest with watchedTiers in the sense that if there is a tie, then we do what watchedTiers dictates. So I guess it would make sense for the case in discussion to watch the tier as it is after all present in the watchedTiers list.

Please let me know what you think 😃

@clintropolis
Copy link
Member

+1 for hard failure and complaining loudly in the logs if both watched and ignored are configured, i can't see any use case for having them both at the same time

@@ -1794,6 +1794,7 @@ See [cache configuration](#cache-configuration) for how to configure cache setti
|--------|---------------|-----------|-------|
|`druid.serverview.type`|batch or http|Segment discovery method to use. "http" enables discovering segments using HTTP instead of zookeeper.|batch|
|`druid.broker.segment.watchedTiers`|List of strings|Broker watches the segment announcements from processes serving segments to build cache of which process is serving which segments, this configuration allows to only consider segments being served from a whitelist of tiers. By default, Broker would consider all tiers. This can be used to partition your dataSources in specific Historical tiers and configure brokers in partitions so that they are only queryable for specific dataSources.|none|
|`druid.broker.segment.ignoredTiers`|List of strings|The Broker watches the segment announcements from processes that serve segments to build a cache to relate each process to the segments it serves. This configuration allows the Broker to ignore the segments being served from a list of tiers. By default, Broker considers all tiers. This config is mutually exclusive from `druid.broker.segment.watchedTiers` and at most only one of these can be configured on a Broker.|none|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should druid.broker.segment.watchedTiers docs also be updated to indicate that it is mutually exclusive with ignoredTiers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that would help remove any ambiguity.

this.clients = new ConcurrentHashMap<>();
this.selectors = new HashMap<>();
this.timelines = new HashMap<>();

// Validate and set the segment watcher config
this.segmentWatcherConfig = segmentWatcherConfig;
if (segmentWatcherConfig.getWatchedTiers() != null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know this isn't new, but I wonder since we now doing some validation here if we should also explode if watchedTiers is empty, which also seems like an invalid configuration

Copy link
Contributor Author

@kfaraz kfaraz Oct 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would make sense to add a validate method in BrokerSegmentWatcherConfig that contains all these validations and just invoke segmentWatcherConfig.validate() from here or should we keep them here itself?


//non-defaults
json = "{ \"watchedTiers\": [\"t1\", \"t2\"], \"watchedDataSources\": [\"ds1\", \"ds2\"] }";
json = "{ \"watchedTiers\": [\"t1\", \"t2\"], \"watchedDataSources\": [\"ds1\", \"ds2\"], \"ignoredTiers\": [\"t3\", \"t4\"] }";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is sort of strange now that it is an invalid configuration, though technically it isn't enforced by the config itself so it isn't problematic here. maybe we should at least leave a comment that this config is illegal in practice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you are right, we should not use an invalid configuration here. Fixing.

Copy link
Contributor

@capistrant capistrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 after green CI

@abhishekagarwal87 abhishekagarwal87 merged commit b688db7 into apache:master Oct 6, 2021
@abhishekagarwal87
Copy link
Contributor

Thank you @kfaraz. I have merged your change.

@abhishekagarwal87 abhishekagarwal87 added this to the 0.23.0 milestone May 11, 2022
@kfaraz kfaraz deleted the add_broker_config_ignore_tiers branch August 1, 2023 04:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants