-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: implement function to identify if node is present in aws #5632
fix: implement function to identify if node is present in aws #5632
Conversation
} | ||
|
||
// we don't care about the instance health | ||
if _, ok := aws.awsManager.asgCache.instanceStatus[*awsRef]; !ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have instanceStatus
field on asgCache
to track health status by EC2 instance. I am re-using the instantStatus
map to check if the node is present without caring for its health since we just want to check if the instance is present in the cloud provider.
instanceStatus
is updated at
autoscaler/cluster-autoscaler/cloudprovider/aws/auto_scaling_groups.go
Lines 401 to 404 in 1d56b81
ref := m.buildInstanceRefFromAWS(instance) | |
newInstanceToAsgCache[ref] = asg | |
newAsgToInstancesCache[asg.AwsRef][i] = ref | |
newInstanceStatusMap[ref] = instance.HealthStatus |
} | ||
|
||
// we don't care about the instance health | ||
if _, ok := aws.awsManager.asgCache.instanceStatus[*awsRef]; !ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vadasambar please use the asgCache.InstanceStatus
function instead of directly accessing (in a non-thread-safe fashion) the underlying map :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for noticing it! I have updated the code.
|
||
// we don't care about the status | ||
status, err := aws.awsManager.asgCache.InstanceStatus(*awsRef) | ||
if status != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am checking status
for nil because I think this is a more reliable way (as opposed to checking there was no error) to check if the node is present since InstanceStatus
wouldn't return nil for a status
unless some error occurs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. The fact that we return any error from HasInstance
or asgCache.InstanceStatus
is kind of annoying anyway. Really, these methods should just return bool
IMHO.
return true, nil | ||
} | ||
|
||
return false, fmt.Errorf("node is not present in aws: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this CA should print the line number of the error but I added an extra wrapper around the error just in case to identify the error came from HasInstance
. Happy to change it based on feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 thank you @vadasambar!
|
||
// we don't care about the status | ||
status, err := aws.awsManager.asgCache.InstanceStatus(*awsRef) | ||
if status != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. The fact that we return any error from HasInstance
or asgCache.InstanceStatus
is kind of annoying anyway. Really, these methods should just return bool
IMHO.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jaypipes, vadasambar The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
- this is a follow-up to kubernetes#5054 - this might fix kubernetes#4456 Signed-off-by: vadasambar <[email protected]> fix: make `HasInstance` in aws provider thread-safe Signed-off-by: vadasambar <[email protected]>
16695e9
to
1cb55fe
Compare
Thank you for the review. I have squashed the commits into a single commit. I think I can't merge the PR by myself. |
/lgtm |
Can this be back ported to 1.24 and 1.25? EDIT: nevermind, looks like it was. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
I think description of #5054 (comment) explains it well:
#5054 exposed interface function
HasInstance
for cloud providers to implement. This function is used to determine if a node is deleted by querying the cloud provider. It falls back to theToBeDeleted
taint if the function is not implemented.Which issue(s) this PR fixes:
Fixes #4456
Special notes for your reviewer:
Nothing in particular.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
cc: @fookenc