Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly handle pods in Error state after initialisation #13

Merged
merged 4 commits into from
Oct 1, 2019

Conversation

FlavioF
Copy link
Contributor

@FlavioF FlavioF commented Sep 10, 2019

Proposal
Before we were handling pod related checks and cluster related checks in the same loop. My proposal is separate that in order to fix #12.

Explanation
It looks that for each pod AS operator is validating cluster size both in k8s and in Aerospike. For the scenario described in #12 the validation is failing. This fail happens for AS pod with index 1 in first place, so the operator decides to kill that pod to try to fix cluster size problem. However, since it is happening because one of the other AS pods is in Error state, AS operator start the erroneous loop described in #12.
The idea around this PR is to separate AS pod checks from AS cluster checks. So, every pod is checked first. After that the state of the AS cluster for every AS pod is checked.

@FlavioF FlavioF requested a review from pires September 10, 2019 23:17
@FlavioF FlavioF self-assigned this Sep 10, 2019
@FlavioF FlavioF requested review from pires and removed request for pires September 10, 2019 23:18
@pires
Copy link
Contributor

pires commented Sep 16, 2019

I think the logic in ensureClusterSize must be revisited as, clearly, deleting a pod that may be operating as expected just because there's a problem in the cluster originated by some other pod simply isn't right.
I need time to think about this and discuss with you so let's try and find the time.

@pires pires merged commit d447ff9 into master Oct 1, 2019
@pires pires deleted the feature/fix_pod_error_recovery branch October 1, 2019 10:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Operator can't handle pods in Error state after initialisation
2 participants