Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow KafkaRoller talk to controller directly #10016

Merged
merged 6 commits into from
Nov 28, 2024

Conversation

tinaselenge
Copy link
Contributor

@tinaselenge tinaselenge commented Apr 23, 2024

Type of change

Select the type of your PR

  • Enhancement / new feature

Description

  • Add a method for creating admin client for controllers with BOOTSTRAP_CONTROLLERS_CONFIG set.
  • Make KafkaRoller create admin client against controllers nodes, if they are running 3.9.0 or later.
  • Tidy up: remove ceShouldBeFatal option as it's never set to true.

There will be a follow up PR to dynamically apply configurations for controllers (with version 3.9.0 or later) using the admin client instead of how we currently restart controllers when any config changes.

Closes #9692

Checklist

Please go through this checklist and make sure all applicable tasks have been done

  • Write tests
  • Make sure all tests pass
  • Update documentation
  • Check RBAC rights for Kubernetes / OpenShift roles
  • Try your changes from Pod inside your Kubernetes and OpenShift cluster, not just locally
  • Reference relevant issue(s) and close them after merging
  • Update CHANGELOG.md
  • Supply screenshots for visual changes, such as Grafana dashboards

@tinaselenge tinaselenge force-pushed the add-controller-svc branch 2 times, most recently from 52d9241 to bb928d6 Compare September 4, 2024 14:24
@tinaselenge tinaselenge marked this pull request as ready for review November 13, 2024 16:21
@scholzj scholzj added this to the 0.45.0 milestone Nov 13, 2024
@scholzj
Copy link
Member

scholzj commented Nov 13, 2024

/azp run regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@scholzj
Copy link
Member

scholzj commented Nov 13, 2024

/azp run zookeeper-regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tinaselenge
Copy link
Contributor Author

I will check the regression test failures.

@tinaselenge
Copy link
Contributor Author

@scholzj can you please kick off the regression tests again? The failed system tests seem to pass for me when running locally.

@Frawless
Copy link
Member

@tinaselenge could you please rebase to avoid running testCertifiactes ST that was removed here #10830 ?

@tinaselenge
Copy link
Contributor Author

thanks @Frawless . I rebased now.

@scholzj
Copy link
Member

scholzj commented Nov 18, 2024

/azp run regression

@scholzj
Copy link
Member

scholzj commented Nov 18, 2024

/azp run zookeeper-regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@scholzj
Copy link
Member

scholzj commented Nov 18, 2024

I started the regressions. But the Unit test failure seems related to your PR.

@scholzj
Copy link
Member

scholzj commented Nov 25, 2024

/azp run zookeeper-regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I left some questions / nits.

@scholzj
Copy link
Member

scholzj commented Nov 26, 2024

/azp run regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@scholzj
Copy link
Member

scholzj commented Nov 26, 2024

/azp run zookeeper-regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to run the STs on this with Kafka 3.8.1. But assuming they pass, this looks good to me. Thanks @tinaselenge.

@tinaselenge
Copy link
Contributor Author

Thank you @scholzj!

Update AdminClientProvider to create an admin client for controllers with BOOTSTRAP_CONTROLLERS_CONFIG set.
Allow KafkaRoller create admin client against controllers nodes, if they are running 3.9.0 or later.
Remove ceShouldBeFatal option as it's never set to true.

Signed-off-by: Gantigmaa Selenge <[email protected]>
Signed-off-by: Gantigmaa Selenge <[email protected]>
Signed-off-by: Gantigmaa Selenge <[email protected]>
@scholzj
Copy link
Member

scholzj commented Nov 28, 2024

Thanks for the PR @tinaselenge

@scholzj scholzj merged commit 1f1a16d into strimzi:main Nov 28, 2024
13 checks passed
@tinaselenge tinaselenge deleted the add-controller-svc branch November 28, 2024 12:53
OwenCorrigan76 pushed a commit to OwenCorrigan76/strimzi-kafka-operator that referenced this pull request Dec 6, 2024
tinaselenge added a commit to tinaselenge/strimzi-kafka-operator that referenced this pull request Dec 12, 2024
This reverts commit 1f1a16d.

This commit caused Warning messages indicating that describeQuorum request was sent to a non active controller therefore failed rolling a controller-only node. We discovered that non active controller does not forward the request to the active controller like broker does, but returns NOT_LEADER_OR_FOLLOWER error. This issue gets resolved by itself eventually after retrying the request several times, because the describeQuorum gets sent to the active controller at some point. However, it could cause some delay in rolling controller-only nodes, due to the number of retrying.

Reverting this commit from 0.45.0 release branch, and the issue will be fixed properly in the main branch to target the next release.
tinaselenge added a commit to tinaselenge/strimzi-kafka-operator that referenced this pull request Dec 12, 2024
This reverts commit 1f1a16d.

This commit caused Warning messages indicating that describeQuorum request was sent to a non active controller therefore failed rolling a controller-only node. We discovered that non active controller does not forward the request to the active controller like broker does, but returns NOT_LEADER_OR_FOLLOWER error. This issue gets resolved by itself eventually after retrying the request several times, because the describeQuorum gets sent to the active controller at some point. However, it could cause some delay in rolling controller-only nodes, due to the number of retrying.

Reverting this commit from 0.45.0 release branch, and the issue will be fixed properly in the main branch to target the next release.

Signed-off-by: Gantigmaa Selenge <[email protected]>
scholzj pushed a commit that referenced this pull request Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 0.45.0
Development

Successfully merging this pull request may close these issues.

[KRaft]: Allow KafkaRoller directly connect to controllers
5 participants