-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tolerate InvalidInstanceID.NotFound which deleting instances #594
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We treat as instance-already-deleted, i.e. not an error Fix kubernetes#592
hwoarang
added a commit
to hwoarang/kops
that referenced
this pull request
Nov 25, 2020
Sometimes we see the following error at the end of a rolling update: I1125 18:37:57.161591 165 instancegroups.go:419] Cluster validated; revalidating in 10s to make sure it does not flap. I1125 18:38:08.536470 165 instancegroups.go:416] Cluster validated. error deleting instance "i-XXXXXXXXXXX", node "ip-XXX-XXX-XXX-XXX.XXXXXXX.compute.internal": error deleting instance "i-XXXXXXXXXXXXXX": InvalidInstanceID.NotFound: The instance ID 'i-XXXXXXXXXXX' does not exist status code: 400, request id: 91238c21-1caf-41eb-91d7-534d4ca67ed0 It's possible that the EC2 instance to have disappeared by the time it was detached (for example, it may have been a spot instance). In any case, we can't do much when we do not find an instance id, and breaking the update because of that is not user friendly. As such, we can simply report and tolerate this problem instead of exiting with non-zero code. This is similar to how we handle missing EC2 when updating an IG[1] [1] kubernetes#594
hwoarang
added a commit
to hwoarang/kops
that referenced
this pull request
Nov 25, 2020
Sometimes we see the following error at the end of a rolling update: I1125 18:37:57.161591 165 instancegroups.go:419] Cluster validated; revalidating in 10s to make sure it does not flap. I1125 18:38:08.536470 165 instancegroups.go:416] Cluster validated. error deleting instance "i-XXXXXXXXXXX", node "ip-XXX-XXX-XXX-XXX.XXXXXXX.compute.internal": error deleting instance "i-XXXXXXXXXXXXXX": InvalidInstanceID.NotFound: The instance ID 'i-XXXXXXXXXXX' does not exist status code: 400, request id: 91238c21-1caf-41eb-91d7-534d4ca67ed0 It's possible that the EC2 instance to have disappeared by the time it was detached (for example, it may have been a spot instance). In any case, we can't do much when we do not find an instance id, and breaking the update because of that is not user friendly. As such, we can simply report and tolerate this problem instead of exiting with non-zero code. This is similar to how we handle missing EC2 when updating an IG[1] [1] kubernetes#594
hwoarang
added a commit
to hwoarang/kops
that referenced
this pull request
Nov 25, 2020
Sometimes we see the following error at the end of a rolling update: I1125 18:12:46.467059 165 instancegroups.go:340] Draining the node: "ip-X-X-X-X.X.compute.internal". I1125 18:12:46.473365 165 instancegroups.go:359] deleting node "ip-X-X-X-X.X.compute.internal" from kubernetes I1125 18:12:46.476756 165 instancegroups.go:486] Stopping instance "i-XXXXXXXX", node "ip-X-X-X-X.X.compute.internal", in group "X" (this may take a while). E1125 18:12:46.523269 165 instancegroups.go:367] error deleting instance "i-XXXXXXXX", node "ip-X-X-X-X.X.compute.internal": error deleting instance "i-XXXXXXXX", node "ip-X-X-X-X.X.compute.internal": error deleting instance "i-XXXXXXXX": InvalidInstanceID.NotFound: The instance ID 'i-XXXXXXXXX' does not exist status code: 400, request id: 91238c21-1caf-41eb-91d7-534d4ca67ed0 It's possible that the EC2 instance to have disappeared by the time it was detached (it may have been a spot instance for example) In any case, we can't do much when we do not find an instance id, and throwing this error during the update is not very user friendly. As such, we can simply report and tolerate this problem instead of exiting with non-zero code. This is similar to how we handle missing EC2 when updating an IG[1] [1] kubernetes#594
hwoarang
added a commit
to hwoarang/kops
that referenced
this pull request
Nov 26, 2020
Sometimes we see the following error during a rolling update: I1125 18:12:46.467059 165 instancegroups.go:340] Draining the node: "ip-X-X-X-X.X.compute.internal". I1125 18:12:46.473365 165 instancegroups.go:359] deleting node "ip-X-X-X-X.X.compute.internal" from kubernetes I1125 18:12:46.476756 165 instancegroups.go:486] Stopping instance "i-XXXXXXXX", node "ip-X-X-X-X.X.compute.internal", in group "X" (this may take a while). E1125 18:12:46.523269 165 instancegroups.go:367] error deleting instance "i-XXXXXXXX", node "ip-X-X-X-X.X.compute.internal": error deleting instance "i-XXXXXXXX", node "ip-X-X-X-X.X.compute.internal": error deleting instance "i-XXXXXXXX": InvalidInstanceID.NotFound: The instance ID 'i-XXXXXXXXX' does not exist status code: 400, request id: 91238c21-1caf-41eb-91d7-534d4ca67ed0 It's possible that the EC2 instance to have disappeared by the time it was detached (it may have been a spot instance for example) In any case, we can't do much when we do not find an instance id, and throwing this error during the update is not very user friendly. As such, we can simply report and tolerate this problem instead of exiting with non-zero code. This is similar to how we handle missing EC2 when updating an IG[1] [1] kubernetes#594
hakman
pushed a commit
to hakman/kops
that referenced
this pull request
Nov 26, 2020
Sometimes we see the following error during a rolling update: I1125 18:12:46.467059 165 instancegroups.go:340] Draining the node: "ip-X-X-X-X.X.compute.internal". I1125 18:12:46.473365 165 instancegroups.go:359] deleting node "ip-X-X-X-X.X.compute.internal" from kubernetes I1125 18:12:46.476756 165 instancegroups.go:486] Stopping instance "i-XXXXXXXX", node "ip-X-X-X-X.X.compute.internal", in group "X" (this may take a while). E1125 18:12:46.523269 165 instancegroups.go:367] error deleting instance "i-XXXXXXXX", node "ip-X-X-X-X.X.compute.internal": error deleting instance "i-XXXXXXXX", node "ip-X-X-X-X.X.compute.internal": error deleting instance "i-XXXXXXXX": InvalidInstanceID.NotFound: The instance ID 'i-XXXXXXXXX' does not exist status code: 400, request id: 91238c21-1caf-41eb-91d7-534d4ca67ed0 It's possible that the EC2 instance to have disappeared by the time it was detached (it may have been a spot instance for example) In any case, we can't do much when we do not find an instance id, and throwing this error during the update is not very user friendly. As such, we can simply report and tolerate this problem instead of exiting with non-zero code. This is similar to how we handle missing EC2 when updating an IG[1] [1] kubernetes#594
hakman
pushed a commit
to hakman/kops
that referenced
this pull request
Nov 26, 2020
Sometimes we see the following error during a rolling update: I1125 18:12:46.467059 165 instancegroups.go:340] Draining the node: "ip-X-X-X-X.X.compute.internal". I1125 18:12:46.473365 165 instancegroups.go:359] deleting node "ip-X-X-X-X.X.compute.internal" from kubernetes I1125 18:12:46.476756 165 instancegroups.go:486] Stopping instance "i-XXXXXXXX", node "ip-X-X-X-X.X.compute.internal", in group "X" (this may take a while). E1125 18:12:46.523269 165 instancegroups.go:367] error deleting instance "i-XXXXXXXX", node "ip-X-X-X-X.X.compute.internal": error deleting instance "i-XXXXXXXX", node "ip-X-X-X-X.X.compute.internal": error deleting instance "i-XXXXXXXX": InvalidInstanceID.NotFound: The instance ID 'i-XXXXXXXXX' does not exist status code: 400, request id: 91238c21-1caf-41eb-91d7-534d4ca67ed0 It's possible that the EC2 instance to have disappeared by the time it was detached (it may have been a spot instance for example) In any case, we can't do much when we do not find an instance id, and throwing this error during the update is not very user friendly. As such, we can simply report and tolerate this problem instead of exiting with non-zero code. This is similar to how we handle missing EC2 when updating an IG[1] [1] kubernetes#594
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We treat as instance-already-deleted, i.e. not an error
Fix #592