Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd backend not passing watcher errors up the stack, and errors not … #1337

Merged
merged 1 commit into from
Oct 30, 2020

Conversation

robbrockbank
Copy link
Contributor

@robbrockbank robbrockbank commented Oct 29, 2020

…triggering full resync in watcher cache

Description

This contains a couple of fixes

  • In etcd backend, make sure errors due to watches are passed as an event - this is to ensure real errors result in a re-sync. In particular compacted revisions manifest as an error on the watch channel - we were not sending an error and just terminating the channel causing us to loop indefinitely trying to use a compacted revision.

  • Once in-sync, do not go out of sync in the watcher syncer. There is no point in changing status, we already cache the revisions so that once we are connected again we send the correct updates. Felix was not using the status change, and the status change was actually causing issues with other components that were not expecting the status change.

  • Remove the ErrorWatchTerminated definition so that we don't try to use it. This forces us to fix up applications that were relying on this error type.

Should fix: projectcalico/calico#4109

Todos

  • Tests
  • Documentation
  • Release note

Release Note

None required

Copy link
Contributor

@mgleung mgleung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@doublek doublek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - the etcd watcher fixes are straightforward and looks correct. I looked through the scenarios for the InSync fix and I think it should work as expected.

@robbrockbank robbrockbank merged commit 3be43bd into projectcalico:master Oct 30, 2020
@robbrockbank robbrockbank deleted the handle-old-revision branch October 30, 2020 22:26
@nelljerram
Copy link
Member

@robbrockbank Thanks for handling this.

At the risk of being annoying - since I haven't understood the detail here - can I just ask if the combination of #1247 + this still makes sense? I'm asking because it appears to me that this PR reverts about half of the changes in #1247 .

(Also cc @caseydavenport , as the author of #1247 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Severe regression in calico-3.16 with etcdv3 backend
5 participants