Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BrokerSetAwareGoal #1809

Merged
merged 7 commits into from
Jul 11, 2022

Conversation

mohitpali
Copy link
Contributor

This PR resolves #1782

Currently, cruise-control does not have a way to limit the leader and replica movement plan within a Broker Group. We would like to make cruise-control aware of the Broker Groups and propose optimizations where each optimization does leader movement or replica movement within a Broker Group. In short, each optimization should be limited within the boundary of Broker Group.

In this PR :

  1. I have added new BrokerSetAwareGoal (Hard Goal).
  2. I have also added a BrokerSetDataStore interface.
  3. I have provided a file based default implementation to fetch BrokerSet Information called BrokerSetFileDataStore

@efeg efeg requested review from efeg, CCisGG and gitlw March 31, 2022 01:23
@efeg
Copy link
Collaborator

efeg commented Mar 31, 2022

Thanks for the PR @mohitpali -- quick note, looks like there are some JavaDoc formatting issues -- i.e.

> Task :cruise-control:javadoc
Picked up _JAVA_OPTIONS: -Xms512m -Xmx1g
/home/circleci/workspace/cruise-control/src/main/java/com/linkedin/kafka/cruisecontrol/analyzer/goals/BrokerSetAwareGoal.java:60: error: unexpected end tag: </p>
 * </p>
   ^
/home/circleci/workspace/cruise-control/src/main/java/com/linkedin/kafka/cruisecontrol/analyzer/goals/BrokerSetAwareGoal.java:68: error: bad use of '>'
 * _excludedTopics -> Handled by BrokerSetAwareGoal
                    ^
/home/circleci/workspace/cruise-control/src/main/java/com/linkedin/kafka/cruisecontrol/analyzer/goals/BrokerSetAwareGoal.java:69: error: bad use of '>'
 * _excludedBrokersForReplicaMove -> Handled by AbstractGoal
                                   ^
/home/circleci/workspace/cruise-control/src/main/java/com/linkedin/kafka/cruisecontrol/analyzer/goals/BrokerSetAwareGoal.java:70: error: bad use of '>'
 * _requestedDestinationBrokerIds -> Handled by AbstractGoal
                                   ^
/home/circleci/workspace/cruise-control/src/main/java/com/linkedin/kafka/cruisecontrol/analyzer/goals/BrokerSetAwareGoal.java:71: error: bad use of '>'
 * _onlyMoveImmigrantReplicas -> Handled by BrokerSetAwareGoa
                               ^
...

@mohitpali
Copy link
Contributor Author

mohitpali commented Mar 31, 2022

Done

Copy link
Collaborator

@efeg efeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohitpali Thanks for the PR!
I started the review and left some comments. I am midway through the review.

Copy link
Collaborator

@efeg efeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more comments -- will continue the review.

@CCisGG
Copy link
Contributor

CCisGG commented May 13, 2022

Hi @mohitpali , while you're updating the PR, I have some thoughts here to discuss/confirm with you.

Essentially here we need to solve 3 problems:

  1. Which broker belongs to which broker set. [broker -> broker set]
  2. Which topic belongs to which broker set. [topic -> broker set]
  3. Which replica belongs to which broker. [replica -> broker]

For Prob.1, this PR provides a static file.
The major challenge then is what if the brokers are added or removed.

For Prob.2, this PR decides to move topic to it's original broker set, making an assumption of topic lives within a broker set at creation time.
The major issue of that solution is:

  1. The assumption does not apply for most users other than AWS.
  2. When there's no "original broker set", we still need to move all replicas to a single broker set.

For Prob.3, this PR allows the replicas goes to any eligible brokers in the target broker set. This looks fine for me.

Given the above observation, we may need to provide some pluggable component to solve Prob.1 and Prob.2. These are my current understanding. @mohitpali @efeg Please feel free to share your thoughts about it!

@mohitpali mohitpali force-pushed the migrate_to_kafka_2_4 branch from 18fbc4b to 0c5b037 Compare May 17, 2022 01:59
@mohitpali
Copy link
Contributor Author

Hi @mohitpali , while you're updating the PR, I have some thoughts here to discuss/confirm with you.

Essentially here we need to solve 3 problems:

1. Which broker belongs to which broker set. [broker -> broker set]

2. Which topic belongs to which broker set.  [topic -> broker set]

3. Which replica belongs to which broker. [replica -> broker]

For Prob.1, this PR provides a static file. The major challenge then is what if the brokers are added or removed.

For Prob.2, this PR decides to move topic to it's original broker set, making an assumption of topic lives within a broker set at creation time. The major issue of that solution is:

1. The assumption does not apply for most users other than AWS.

2. When there's no "original broker set", we still need to move all replicas to a single broker set.

For Prob.3, this PR allows the replicas goes to any eligible brokers in the target broker set. This looks fine for me.

Given the above observation, we may need to provide some pluggable component to solve Prob.1 and Prob.2. These are my current understanding. @mohitpali @efeg Please feel free to share your thoughts about it!

I think this sounds correct. The outstanding issue is the topic to BrokerSet mapping policy - I will look into this.

@mohitpali mohitpali requested review from efeg and CCisGG May 17, 2022 02:31
@mohitpali mohitpali force-pushed the migrate_to_kafka_2_4 branch from 0c5b037 to 773a836 Compare May 21, 2022 00:39
@mohitpali mohitpali requested a review from CCisGG May 21, 2022 00:41
@mohitpali mohitpali force-pushed the migrate_to_kafka_2_4 branch from 773a836 to 4538f6e Compare May 24, 2022 01:57
@mohitpali mohitpali requested a review from CCisGG May 24, 2022 04:21
Copy link
Collaborator

@efeg efeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates -- left some comments.

// If the brokerSet awareness condition is violated. Move replica to an eligible broker
Set<Broker> eligibleBrokers = _brokersByBrokerSet.get(eligibleBrokerSetIdForReplica)
.stream()
.filter(b -> clusterModel.aliveBrokers().contains(b))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change _brokersByBrokerSet with something like _aliveBrokersByBrokerSet? That way we can drop the need for filtering here. As far as I can tell we never use brokers that are no alive for this goal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried filtering this on the BrokerSetResolverHelper but the ExcludedBrokersForLeaderShipTest fails since it marks excluded brokers as dead and clusterModel.aliveBrokers() do not contain the excluded broker for leadership.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 questions:

  1. Could you point me to where it marks excluded brokers as dead?
  2. Even if clusterModel.aliveBrokers() does not contain excluded broker, the test should succeed? E.g. if you have enough alive brokers, I don't see the reason why this goal can fail.

Copy link
Collaborator

@efeg efeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a full pass - left a few more comments.

@CCisGG
Copy link
Contributor

CCisGG commented Jun 5, 2022

Hi @mohitpali , I just realized another issue during my testing. It turns out that BrokerSetAwareGoal will naturally conflict with MinTopicLeadersPerBrokerGoal, as MinTopicLeadersPerBrokerGoal requires the topic exists on all brokers. As a result, optimization would fail whenever both goals are added as hard goals. I'm wondering if it's a good idea to exclude the topics defined in topics.with.min.leaders.per.broker when we do replica movements in BrokerSetAwareGoal. @efeg fyi.

@CCisGG
Copy link
Contributor

CCisGG commented Jun 6, 2022

Another observation is: if a broker dies, BrokerAwareGoal is still trying to rebalance for that broker. This is arguable.

When I implement my custom BrokerSetResolver, I would only color the alive brokers. With my implementation, today it's throwing "Failed to resolve BrokerSet for Broker Broker[id=XXX,rack=,state=DEAD" exception when the broker is dead.

I'm wondering if one should always assign all brokers with a color, dead or alive, in the custom BrokerSetResolver. If that's the case, maybe we can add something in the java doc in the BrokerSetResolver interface like "All brokers including dead ones should be assigned with a color".

(And sorry I accidentally closed the PR and reopened it... Hope that doesn't break anything)

@CCisGG CCisGG closed this Jun 6, 2022
@CCisGG CCisGG reopened this Jun 6, 2022
@mohitpali
Copy link
Contributor Author

Another observation is: if a broker dies, BrokerAwareGoal is still trying to rebalance for that broker. This is arguable.

When I implement my custom BrokerSetResolver, I would only color the alive brokers. With my implementation, today it's throwing "Failed to resolve BrokerSet for Broker Broker[id=XXX,rack=,state=DEAD" exception when the broker is dead.

I'm wondering if one should always assign all brokers with a color, dead or alive, in the custom BrokerSetResolver. If that's the case, maybe we can add something in the java doc in the BrokerSetResolver interface like "All brokers including dead ones should be assigned with a color".

(And sorry I accidentally closed the PR and reopened it... Hope that doesn't break anything)

I think it is because of below code -

  public String brokerSetId(Broker broker) throws BrokerSetResolutionException {
    String brokerSetId = _brokerSetIdByBroker.get(broker);

    if (brokerSetId != null) {
      return brokerSetId;
    }

    throw new BrokerSetResolutionException(String.format("Failed to resolve BrokerSet for Broker %s", broker));
  }

The reason the above code was introduced was since if a brokerSetId is ever found to be null, we would not be able to map the brokers for a brokerSetId. In your case I suspect since the Broker object changed during the goal run (state went from ALIVE to DEAD), the cached map did not have the mapping

private Map<Broker, String> _brokerSetIdByBroker;

I can modify it to instead map by BrokerId, and if the Broker object state has changed, it wouldn't throw this exception.

There could be other cases like race condition where during the goal run, a broker is added on the cluster but the brokerSet map does not have information on it. The BrokerSet resolution and ClusterModel's list of brokers may not match during this time. I would assume that a good brokerSet resolver is dynamic and even if a race condition fails optimization, next iteration of optimization will get fixed if the info has synched up.

@mohitpali mohitpali force-pushed the migrate_to_kafka_2_4 branch from 96fd464 to 1750369 Compare June 8, 2022 20:53
@mohitpali
Copy link
Contributor Author

Hi @mohitpali , I just realized another issue during my testing. It turns out that BrokerSetAwareGoal will naturally conflict with MinTopicLeadersPerBrokerGoal, as MinTopicLeadersPerBrokerGoal requires the topic exists on all brokers. As a result, optimization would fail whenever both goals are added as hard goals. I'm wondering if it's a good idea to exclude the topics defined in topics.with.min.leaders.per.broker when we do replica movements in BrokerSetAwareGoal. @efeg fyi.

I have excluded the topics with regex for MinTopicLeadersPerBrokerGoal

@CCisGG
Copy link
Contributor

CCisGG commented Jun 30, 2022

We should add this goal to cruisecontrol.properties, in the goals properties.

Make sure not adding this to default.goals and hard.goals.

@CCisGG
Copy link
Contributor

CCisGG commented Jul 10, 2022

Hi @mohitpali, I think there's a change needed in AnalyzerConfig to match the change in your latest commit. E.g. add the goal in the static DEFAULT_GOALS string. It should reflect/match the default values for "goals" property.

@CCisGG CCisGG merged commit 3f379ef into linkedin:migrate_to_kafka_2_4 Jul 11, 2022
CCisGG pushed a commit to CCisGG/cruise-control that referenced this pull request Jul 11, 2022
@CCisGG
Copy link
Contributor

CCisGG commented Jul 25, 2022

Hi @mohitpali, I know this is closed, but would like to ask your opinion on this:

Do you think it would be a good idea to "write back" the broker assignment to brokerSet.json file? That case no matter we have the original file or not, we can create/update the latest broker assignment info to that file, and then read it from anywhere we want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request : Enable Cruise Control to work with a Concept of Broker Groups
3 participants