Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] KOPS 1.19 Seg Fault when removing an in-use instance type from a mixed instance group #10718

Closed
timothyclarke opened this issue Feb 3, 2021 · 2 comments
Labels
kind/regression Categorizes issue or PR as related to a regression from a prior release.
Milestone

Comments

@timothyclarke
Copy link
Contributor

timothyclarke commented Feb 3, 2021

1. What kops version are you running? The command kops version, will display
this information.

Version 1.19.0 (git-04d36d7d92c72601efd918877fc180c846129ffb)

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.7", GitCommit:"1dd5338295409edcfff11505e7bb246f0d325d15", GitTreeState:"clean", BuildDate:"2021-01-13T13:23:52Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.7", GitCommit:"1dd5338295409edcfff11505e7bb246f0d325d15", GitTreeState:"clean", BuildDate:"2021-01-13T13:15:20Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}

3. What cloud provider are you using?
AWS

4. What commands did you run? What is the simplest way to reproduce this issue?
kops edit ig <instance group name>, kops update cluster, kops update cluster --yes

  1. Create a mixed instance group, apply and make sure there is atleast 1 instance in the group active
  2. Edit the instance group and remove an instance type that is active
  3. Attempt to update the cluster

An example would be

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2021-02-03T09:39:25Z"
  generation: 3
  labels:
    kops.k8s.io/cluster: dev-euw2.example.com
  name: sites
spec:
  cloudLabels:
    BrandName: sites
    k8s.io/cluster-autoscaler/enabled: '""'
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20210119.1
  machineType: t3a.2xlarge
  maxSize: 11
  minSize: 1
  mixedInstancesPolicy:
    instances:
    - t3a.2xlarge
    - m5a.xlarge
    - m5a.2xlarge
    - m5a.4xlarge
    - m4.2xlarge
    - m5.2xlarge
    - m5.4xlarge
    - r5a.xlarge
    - r5a.2xlarge
    - r5.xlarge
    - r5.2xlarge
    - c5a.xlarge
    - c5a.2xlarge
    - c5a.4xlarge
    - c5.2xlarge
    - c5.4xlarge
    onDemandAboveBase: 0
    spotInstancePools: 15
  nodeLabels:
    kops.k8s.io/instancegroup: example
  role: Node
  rollingUpdate:
    maxSurge: 2
  rootVolumeSize: 48
  subnets:
  - eu-west-2a
  - eu-west-2b
  - eu-west-2c

Multiple c5a.xlarge instances had spawned which had insufficient ram, so I edited the instance group to remove that type.

5. What happened after the commands executed?
kops gave a segfault

kops update cluster --state "s3://${S3_BUCKET}" --name ${KUBE_NAME}
I0203 16:19:42.232619   92530 executor.go:111] Tasks: 0 done / 153 total; 62 can run
I0203 16:19:42.751426   92530 executor.go:111] Tasks: 62 done / 153 total; 26 can run
I0203 16:19:43.127438   92530 executor.go:111] Tasks: 88 done / 153 total; 51 can run
I0203 16:19:43.502894   92530 executor.go:111] Tasks: 139 done / 153 total; 14 can run
I0203 16:19:43.664899   92530 dnsname.go:121] AliasTarget for "api.dev-euw2.example.com." is "api-dev-euw2-REMOVED.eu-west-2.elb.amazonaws.com."
I0203 16:19:43.862420   92530 dnsname.go:121] AliasTarget for "bastion.dev-euw2.example.com." is "bastion-dev-euw2-REMOVED.eu-west-2.elb.amazonaws.com."
I0203 16:19:43.964038   92530 executor.go:111] Tasks: 153 done / 153 total; 0 can run
Will modify resources:
  AutoscalingGroup/newgroup.dev-euw2.example.com
  	MixedInstanceOverrides	 [t3a.2xlarge, m5a.xlarge, m5a.2xlarge, m5a.4xlarge, m4.2xlarge, m5.2xlarge, m5.4xlarge, r5a.xlarge, r5a.2xlarge, r5.xlarge, r5.2xlarge, c5a.xlarge, c5a.2xlarge, c5a.4xlarge, c5.2xlarge, c5.4xlarge] -> [t3a.2xlarge, m5a.xlarge, m5a.2xlarge, m5a.4xlarge, m4.2xlarge, m5.2xlarge, m5.4xlarge, r5a.xlarge, r5a.2xlarge, r5.xlarge, r5.2xlarge, c5a.2xlarge, c5a.4xlarge, c5.2xlarge, c5.4xlarge]

Must specify --yes to apply changes
kops update cluster --state "s3://${S3_BUCKET}" --name ${KUBE_NAME} --yes  
I0203 15:53:44.014771   89463 executor.go:111] Tasks: 0 done / 153 total; 62 can run
I0203 15:53:44.499571   89463 executor.go:111] Tasks: 62 done / 153 total; 26 can run
I0203 15:53:44.856034   89463 executor.go:111] Tasks: 88 done / 153 total; 51 can run
I0203 15:53:45.227500   89463 executor.go:111] Tasks: 139 done / 153 total; 14 can run
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x2c451fa]

goroutine 1322 [running]:
k8s.io/kops/upup/pkg/fi/cloudup/awstasks.(*AutoscalingGroup).RenderAWS(0xc000a8f800, 0xc0013f8ee0, 0xc0000d2800, 0xc000a8f800, 0xc0000d2a00, 0x0, 0x0)
	upup/pkg/fi/cloudup/awstasks/autoscalinggroup.go:492 +0x1c9a
reflect.Value.call(0x3fefc00, 0xc000a8f800, 0x2a13, 0x412ea26, 0x4, 0xc001b5f8c0, 0x4, 0x4, 0x4760f00, 0xc00004d480, ...)
	GOROOT/src/reflect/value.go:476 +0x8c7
reflect.Value.Call(0x3fefc00, 0xc000a8f800, 0x2a13, 0xc001b5f8c0, 0x4, 0x4, 0xc000a8f800, 0x2a13, 0xc001b5f860)
	GOROOT/src/reflect/value.go:337 +0xb9
k8s.io/kops/upup/pkg/fi.(*Context).Render(0xc0005626e0, 0x4691a00, 0xc0000d2800, 0x4691a00, 0xc000a8f800, 0x4691a00, 0xc0000d2a00, 0x0, 0x0)
	upup/pkg/fi/context.go:222 +0xef5
k8s.io/kops/upup/pkg/fi.DefaultDeltaRunMethod(0x4691a00, 0xc000a8f800, 0xc0005626e0, 0x476a180, 0xc000cc2e00)
	upup/pkg/fi/default_methods.go:79 +0x54a
k8s.io/kops/upup/pkg/fi/cloudup/awstasks.(*AutoscalingGroup).Run(0xc000a8f800, 0xc0005626e0, 0xc0002ae070, 0x0)
	upup/pkg/fi/cloudup/awstasks/autoscalinggroup.go:316 +0xd7
k8s.io/kops/upup/pkg/fi.(*executor).forkJoin.func1(0xc000ad07e0, 0xe, 0xe, 0xc001982d10, 0xc000518900, 0xc000e7fb20, 0x6)
	upup/pkg/fi/executor.go:187 +0x1ae
created by k8s.io/kops/upup/pkg/fi.(*executor).forkJoin
	upup/pkg/fi/executor.go:183 +0x105

6. What did you expect to happen?
I expected the autoscaling group to be updated and remove the instance type specified. While it would be nice if KOPS drained in-use nodes in a similar way that a rolling-update does, I'm equally happy with having to drain the instances myself.
One of the core issues is that if I attempt to grow the mixed instance group before removing this instance type there is a good chance I'll just spawn another instance of the type I'm trying to remove as it's one of the smallest / cheapest.

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.

adding -v8 to the update --yes results in

I0203 16:22:08.992885   92830 request_logger.go:45] AWS request: elasticloadbalancing/DescribeLoadBalancers
I0203 16:22:09.007895   92830 changes.go:81] Field changed "MixedInstanceOverrides" actual="[t3a.2xlarge m5a.xlarge m5a.2xlarge m5a.4xlarge m4.2xlarge m5.2xlarge m5.4xlarge r5a.xlarge r5a.2xlarge r5.xlarge r5.2xlarge c5a.xlarge c5a.2xlarge c5a.4xlarge c5.2xlarge c5.4xlarge]" expected="[t3a.2xlarge m5a.xlarge m5a.2xlarge m5a.4xlarge m4.2xlarge m5.2xlarge m5.4xlarge r5a.xlarge r5a.2xlarge r5.xlarge r5.2xlarge c5a.2xlarge c5a.4xlarge c5.2xlarge c5.4xlarge]"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x2c451fa]

k8s.io/kops/upup/pkg/fi/cloudup/awstasks.(*AutoscalingGroup).RenderAWS(0xc000ed6600, 0xc001581e30, 0xc002084b00, 0xc000ed6600, 0xc002084d00, 0x0, 0x0)
	upup/pkg/fi/cloudup/awstasks/autoscalinggroup.go:492 +0x1c9a
reflect.Value.call(0x3fefc00, 0xc000ed6600, 0x2a13, 0x412ea26, 0x4, 0xc00074de00, 0x4, 0x4, 0x4760f00, 0xc0010c2b80, ...)
	GOROOT/src/reflect/value.go:476 +0x8c7
reflect.Value.Call(0x3fefc00, 0xc000ed6600, 0x2a13, 0xc00074de00, 0x4, 0x4, 0xc000ed6600, 0x2a13, 0xc00074dda0)
	GOROOT/src/reflect/value.go:337 +0xb9
k8s.io/kops/upup/pkg/fi.(*Context).Render(0xc001374780, 0x4691a00, 0xc002084b00, 0x4691a00, 0xc000ed6600, 0x4691a00, 0xc002084d00, 0x0, 0x0)
	upup/pkg/fi/context.go:222 +0xef5
k8s.io/kops/upup/pkg/fi.DefaultDeltaRunMethod(0x4691a00, 0xc000ed6600, 0xc001374780, 0x476a180, 0xc000ce9980)
	upup/pkg/fi/default_methods.go:79 +0x54a
k8s.io/kops/upup/pkg/fi/cloudup/awstasks.(*AutoscalingGroup).Run(0xc000ed6600, 0xc001374780, 0x0, 0x0)
	upup/pkg/fi/cloudup/awstasks/autoscalinggroup.go:316 +0xd7
k8s.io/kops/upup/pkg/fi.(*executor).forkJoin.func1(0xc0007721c0, 0xe, 0xe, 0xc0010657f0, 0xc001403660, 0xc000fdc380, 0xd)
	upup/pkg/fi/executor.go:187 +0x1ae
created by k8s.io/kops/upup/pkg/fi.(*executor).forkJoin
	upup/pkg/fi/executor.go:183 +0x105

9. Anything else do we need to know?

Note: slight edit to remove possibly sensitive info

@rifelpet rifelpet added the kind/regression Categorizes issue or PR as related to a regression from a prior release. label Feb 4, 2021
@rifelpet rifelpet added this to the v1.19 milestone Feb 4, 2021
@h3poteto
Copy link
Contributor

h3poteto commented Feb 5, 2021

I got the same error not only when removed but also when added.

@hakman
Copy link
Member

hakman commented Feb 7, 2021

Thanks @h3poteto, should be all good with the next patch release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/regression Categorizes issue or PR as related to a regression from a prior release.
Projects
None yet
Development

No branches or pull requests

4 participants