-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reconciler/managed: avoid requeuing if an update event is pending #527
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Dr. Stefan Schimanski <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the intention here, but I'd prefer not to have to call a function to assess whether we need to requeue at every return point.
I'm wondering how bad it would be to just return reconcile.Result{}
(i.e. implicit Requeue: false
). If we did that, we'd still get a reconcile queued immediately if any of the following were true:
- We updated the MR during this reconcile.
- We hit a different error than we did last time (i.e. we updated the
Synced
condition). - We failed to update the MR status (i.e. we returned a non-nil error).
So the edge case we'd be susceptible to would be hitting the same error twice in a row. In that case nothing would trigger a reconcile to be queued until the poll interval expired (or a watch event happened).
// the object has changed. We get a watch event and do not need to | ||
// requeue. This helps to avoid a reconcile on a stale read when the | ||
// informer has not caught up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was under the impression that controller-runtime would deduplicate multiple events for the same object in the queue. i.e. If we return Requeue: true
and a watch event is triggered we'd only reconcile once, not twice - https://github.com/kubernetes-sigs/controller-runtime/blob/c20ea143/pkg/doc.go#L184
Why isn't this working? Is the issue that the watch-triggered reconcile has potentially already been popped from the queue and processed by another goroutine before we return Requeue: true
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and a watch event is triggered we'd only reconcile once, not twice
The requeue is instant. The informer has delay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the issue that the watch-triggered reconcile has potentially already been popped from the queue and processed by another goroutine before we return Requeue: true
No. There is always just one reconcile per key. Only when we return it to the queue, another go routine could take over immediately. But this PR is about the case when there is no event yet, but the key is immediately popped from the queue. That work is both unnecessary and will very likely run into another conflict error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The requeue is instant. The informer has delay.
I think I understand now. You're saying the issue is that we'll requeue so fast (i.e. instantly) that we'll just read the same stale resource from the informer on the next reconcile, and essentially keep doing that until the informer cache is updated.
Actually, we have to hit a case where we return |
How different is this from #372 ? |
Going through old PRs, is this PR still relevant? |
Description of your changes
We use this:
This pattern means that the object is immediately (modulo queue delay) reconciled again. We use cached clients (controller-runtime's default). Chance is high that the immediate client.Get call will return a stale object if that
Update
did a change. We will run all the complex reconcile logic (all the work twice), create noise and all that, events for example, to eventually find out that the world has moved on (through our own update) and our update of the second reconcile iteration will fail with a conflict error (another event).The right pattern would be to only requeue if the update was a noop (= resource version didn't change), or even just set Requeue: false. If the update did something, we will see a watch event anyway (for watched resources at least).
I have:
make reviewable test
to ensure this PR is ready for review.How has this code been tested