Properly handle pods in Error state after initialisation #13

FlavioF · 2019-09-10T23:17:12Z

Proposal
Before we were handling pod related checks and cluster related checks in the same loop. My proposal is separate that in order to fix #12.

Explanation
It looks that for each pod AS operator is validating cluster size both in k8s and in Aerospike. For the scenario described in #12 the validation is failing. This fail happens for AS pod with index 1 in first place, so the operator decides to kill that pod to try to fix cluster size problem. However, since it is happening because one of the other AS pods is in Error state, AS operator start the erroneous loop described in #12.
The idea around this PR is to separate AS pod checks from AS cluster checks. So, every pod is checked first. After that the state of the AS cluster for every AS pod is checked.

Signed-off-by: Pires <[email protected]>

pires · 2019-09-16T14:36:13Z

I think the logic in ensureClusterSize must be revisited as, clearly, deleting a pod that may be operating as expected just because there's a problem in the cluster originated by some other pod simply isn't right.
I need time to think about this and discuss with you so let's try and find the time.

Change ensurePods to handle pod state and cluster state separately

1f0dac0

FlavioF requested a review from pires September 10, 2019 23:17

FlavioF self-assigned this Sep 10, 2019

FlavioF requested review from pires and removed request for pires September 10, 2019 23:18

reconciler: avoid re-iterating over all pods

7731fcb

Signed-off-by: Pires <[email protected]>

FlavioF added 2 commits September 18, 2019 11:19

reconciler: change ensure cluster to count running pods only

a84f6c8

e2e: add tests to replicate when a as pod goes to failing test

2a84104

pires approved these changes Oct 1, 2019

View reviewed changes

pires merged commit d447ff9 into master Oct 1, 2019

pires deleted the feature/fix_pod_error_recovery branch October 1, 2019 10:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Properly handle pods in Error state after initialisation #13

Properly handle pods in Error state after initialisation #13

FlavioF commented Sep 10, 2019

pires commented Sep 16, 2019

Properly handle pods in Error state after initialisation #13

Properly handle pods in Error state after initialisation #13

Conversation

FlavioF commented Sep 10, 2019

pires commented Sep 16, 2019