Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROX-19221: Fix reconciling with label selector for multiple reconcilers #42

Merged

Conversation

kurlov
Copy link
Member

@kurlov kurlov commented Aug 22, 2023

Issue

Assume there is CR with label foo=bar.
If run one reconciler with label selector foo=bar then everything is fine and the reconciler processes the CR.
If start another reconciler with label foo=another then this new reconciler also reconciles foo=bar CR. This is not expected.

Root cause

There is controller.Watch set for Secret kind. This watch has event handler. The Event handler only filters events by GroupVersionKind of the CR so label selector is ignored

Fix

Check for the matching label selector before performing reconcile. Skip reconcile if label is not matching

Local testing with stackrox operator on fresh cluster

# Create namespace for the Central
k create ns stackrox

# Create simple Central CR
k apply -f ~/example-central.yml

# Set foo=bar label 
k label centrals.platform.stackrox.io stackrox-central-services foo=bar -n stackrox

# Run operator
CENTRAL_LABEL_SELECTOR="foo=bar" MAIN_IMAGE_TAG=4.1.1 make install run
...
2023-08-22T02:02:15+02:00	INFO	controller-runtime.metrics	Metrics server is starting to listen	{"addr": ":8080"}
2023-08-22T02:02:15+02:00	INFO	setup	skipping webhook setup, ENABLE_WEBHOOKS==false
2023-08-22T02:02:15+02:00	INFO	setup	using Central label selector from environment variable CENTRAL_LABEL_SELECTOR	{"selector": "foo=bar"}
2023-08-22T02:02:15+02:00	INFO	Using central label selector	{"selector": "foo=bar"}
....
2023-08-22T02:02:15+02:00	INFO	setup	starting manager
2023-08-22T02:02:15+02:00	INFO	starting server	{"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"}
2023-08-22T02:02:15+02:00	INFO	Starting server	{"kind": "health probe", "addr": "[::]:8081"}
2023-08-22T02:02:15+02:00	INFO	Starting EventSource	{"controller": "securedcluster-controller", "source": "kind source: *unstructured.Unstructured"}
2023-08-22T02:02:15+02:00	INFO	Starting EventSource	{"controller": "securedcluster-controller", "source": "kind source: *v1.Secret"}
2023-08-22T02:02:15+02:00	INFO	Starting EventSource	{"controller": "securedcluster-controller", "source": "kind source: *v1alpha1.Central"}
2023-08-22T02:02:15+02:00	INFO	Starting Controller	{"controller": "securedcluster-controller"}
2023-08-22T02:02:15+02:00	INFO	Starting EventSource	{"controller": "central-controller", "source": "kind source: *unstructured.Unstructured"}
2023-08-22T02:02:15+02:00	INFO	Starting EventSource	{"controller": "central-controller", "source": "kind source: *v1.Secret"}
2023-08-22T02:02:15+02:00	INFO	Starting EventSource	{"controller": "central-controller", "source": "kind source: *v1alpha1.SecuredCluster"}
....
# Check that reoncile is triggered
2023-08-22T02:24:20+02:00	DEBUG	controllers.Central	Reconciliation triggered	{"central": {"name":"stackrox-central-services","namespace":"stackrox"}}
2023-08-22T02:24:20+02:00	DEBUG	controllers.Central	Adding uninstall finalizer.	{"central": {"name":"stackrox-central-services","namespace":"stackrox"}}
2023-08-22T02:24:22+02:00	DEBUG	Get release 'stackrox-central-services' latest version.
2023-08-22T02:24:22+02:00	DEBUG	controllers.Central	release not found, most likely it is not installed yet	{"namespace": "stackrox", "name": "stackrox-central-services"}


# Start another operator instance with different label selector
...
CENTRAL_LABEL_SELECTOR="foo=another" MAIN_IMAGE_TAG=4.1.1 make install run
2023-08-22T04:28:29+02:00	INFO	controller-runtime.metrics	Metrics server is starting to listen	{"addr": ":8090"}
2023-08-22T04:28:29+02:00	INFO	setup	skipping webhook setup, ENABLE_WEBHOOKS==false
2023-08-22T04:28:29+02:00	INFO	setup	using Central label selector from environment variable CENTRAL_LABEL_SELECTOR	{"selector": "foo=another"}
2023-08-22T04:28:29+02:00	INFO	Using central label selector	{"selector": "foo=another"}
...
# Check that another operator does not reconcile foo=bar CR
...

# Change label to foo=another to swap which operator should reconcile the CR
k label centrals.platform.stackrox.io stackrox-central-services foo=another -n stackrox --overwrite=true

# Check that foo=bar operator does not reconcile CR anymore
....

# Check that foo=another operator reconciles the CR
2023-08-22T04:28:29+02:00	DEBUG	controllers.Central	Reconciliation triggered	{"central": {"name":"stackrox-central-services","namespace":"stackrox"}}
2023-08-22T04:28:30+02:00	DEBUG	Get release 'stackrox-central-services' latest version.
2023-08-22T04:28:30+02:00	DEBUG	controllers.Central	release not found, most likely it is not installed yet	{"namespace": "stackrox", "name": "stackrox-central-services"}
...
...

*Note:* provided unit testing does not covers bug behaviour. I have an idea how to test the bug case with custom controller manager cache  but I maybe I am overlooking more simpler approach

Copy link
Member

@SimonBaeumer SimonBaeumer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job debugging this issue @kurlov!

pkg/reconciler/reconciler_test.go Outdated Show resolved Hide resolved
pkg/reconciler/reconciler.go Outdated Show resolved Hide resolved
@kurlov kurlov requested a review from ludydoo August 22, 2023 13:35
@kurlov kurlov requested a review from SimonBaeumer August 24, 2023 00:14
@github-actions
Copy link

Pull Request Test Coverage Report for Build 5957809598

  • 4 of 4 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.1%) to 85.229%

Totals Coverage Status
Change from base Build 5762815296: 0.1%
Covered Lines: 1783
Relevant Lines: 2092

💛 - Coveralls

Copy link

@kovayur kovayur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solution looks workable, but a bit rough from my point of view. I would consider adding a label selector to watch as designed in controller-runtime

@@ -650,6 +651,11 @@ func (r *Reconciler) Reconcile(ctx context.Context, req ctrl.Request) (res ctrl.
return ctrl.Result{}, err
}

if r.selectorPredicate != nil && !r.selectorPredicate.Generic(event.GenericEvent{Object: obj}) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't put labels on secrets instead? I.e. do a k8s client customization as it is done for the ownerReference.
Adding correct label selector on secrets watch may also give a performance benefit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing predicate to the secret watch was initial approach. But the downside is that it hides filtering mechanism and it's pretty easy to overlook it in the future (e.g. adding additional watch). Filtering selector in the reconcile is straightforward and robust approach. But you are right, it's not the best in terms of CPU because Reconcile call is triggered

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CPU consumption is minimal here, a reconcile call in itself it is not computationally expensive - it starts a function with exits immediately.

Copy link
Member

@SimonBaeumer SimonBaeumer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, two nits for the custom reconcile.Result and naming.

@@ -650,6 +651,11 @@ func (r *Reconciler) Reconcile(ctx context.Context, req ctrl.Request) (res ctrl.
return ctrl.Result{}, err
}

if r.selectorPredicate != nil && !r.selectorPredicate.Generic(event.GenericEvent{Object: obj}) {
log.V(1).Info("Label selector does not match, skipping reconcile")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the affected object and labels to the log message?
As an engineer I would like to see more details why it was skipped and get a hint which label would reconcile the CR.

Copy link
Member Author

@kurlov kurlov Aug 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name and namespace is part of the logger
Log message will look like this:

2023-08-25T16:25:05+02:00	DEBUG	controllers.Central	Label selector does not match, skipping reconcile	{"central": {"name":"stackrox-central-services","namespace":"stackrox"}}

@@ -1392,6 +1397,40 @@ var _ = Describe("Reconciler", func() {
})
})
})
When("label selector succeeds", func() {
It("reconciles only matching label", func() {
By("setting a broken action client getter for the reconciler", func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This describes the action, can you describe it's impact/reason why it is necessary to add a broken action client getter?
For example:
By returning an invalid action client getter to assert different reconcile results

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, the solution you've proposed using a custom reconcile.Result object for a skipped reconcile looks cleaner to me. Changing action client getter would have side-effects on this test, when using the reconcile.Result the test is side-effect free.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though introducing a new result type is a small change but it might effect many part and I would like to reduce scope to the bug fixing only

@kurlov kurlov merged commit 1361e2f into main Aug 25, 2023
porridge pushed a commit that referenced this pull request Sep 5, 2024
…rs (#42)

* Add filter predicate to secret kind watch

* Change fix to filter resource in the reconcile function

* Fix doc string typo

* Update description for setting broken action client
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants