Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SURE-9285] eks-config-operator crashing (go panic) on eks.CreateNodeGroup #986

Closed
4 tasks
kkaempf opened this issue Dec 2, 2024 · 0 comments
Closed
4 tasks
Assignees
Labels
JIRA Must shout kind/bug Something isn't working

Comments

@kkaempf
Copy link

kkaempf commented Dec 2, 2024

SURE-9285

Issue description:

The Hosted Rancher customer noticed a problem when trying to update the node groups on one of their downstream EKS clusters. We see in the upstream Rancher cluster that the eks-config-operator pod is crashing due to a go panic:

time="2024-10-29T23:16:08Z" level=info msg="Starting eks.cattle.io/v1, Kind=EKSClusterConfig controller"
time="2024-10-29T23:16:08Z" level=info msg="Starting /v1, Kind=Secret controller"
E1029 23:16:09.738753       9 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 101 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x2e8c7e0, 0x4daaed0})
	/home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:75 +0x85
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc001c1d6e0?})
	/home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:49 +0x6b
panic({0x2e8c7e0?, 0x4daaed0?})
	/home/runner/go/pkg/mod/golang.org/[email protected]/src/runtime/panic.go:770 +0x132
github.com/rancher/eks-operator/pkg/eks.CreateNodeGroup({0x3a95118, 0xc0004fe000}, 0xc000d8efe0)
	/home/runner/work/eks-operator/eks-operator/pkg/eks/create.go:326 +0xa37
github.com/rancher/eks-operator/controller.(*Handler).updateUpstreamClusterState(0xc000143f40, {0x3a95118, 0xc0004fe000}, 0xc0008d01e0, 0xc0007ceb08, 0xc001e6ec00, {0xc00137b500, 0x3d}, 0xc000d8f840)
	/home/runner/work/eks-operator/eks-operator/controller/eks-cluster-config-handler.go:832 +0x13fa
github.com/rancher/eks-operator/controller.(*Handler).checkAndUpdate(0xc000143f40, {0x3a95118, 0xc0004fe000}, 0xc0007ceb08, 0xc001e6ec00)
	/home/runner/work/eks-operator/eks-operator/controller/eks-cluster-config-handler.go:319 +0xc12
github.com/rancher/eks-operator/controller.(*Handler).OnEksConfigChanged(0xc000143f40, {0x0?, 0x0?}, 0xc0007ceb08)
	/home/runner/work/eks-operator/eks-operator/controller/eks-cluster-config-handler.go:100 +0x2bc
github.com/rancher/eks-operator/controller.Register.(*Handler).recordError.func1({0xc000810a00?, 0x1a?}, 0xc0007ceb08?)
	/home/runner/work/eks-operator/eks-operator/controller/eks-cluster-config-handler.go:112 +0x37
github.com/rancher/wrangler/v3/pkg/generic.(*Controller[...].func1({0x3a7e7e8?, 0xc0007ceb08?})
	/home/runner/go/pkg/mod/github.com/rancher/wrangler/[email protected]/pkg/generic/controller.go:169 +0x44
github.com/rancher/lasso/pkg/controller.SharedControllerHandlerFunc.OnChange(0x0?, {0xc000810a00?, 0x0?}, {0x3a7e7e8?, 0xc0007ceb08?})
	/home/runner/go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/sharedcontroller.go:29 +0x32
github.com/rancher/lasso/pkg/controller.(*SharedHandler).OnChange(0xc000726960, {0xc000810a00, 0x1a}, {0x3a7e7e8, 0xc0007ceb08})
	/home/runner/go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/sharedhandler.go:75 +0x202
github.com/rancher/lasso/pkg/controller.(*controller).syncHandler(0xc0000d8a50, {0xc000810a00, 0x1a})
	/home/runner/go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/controller.go:236 +0x12e
github.com/rancher/lasso/pkg/controller.(*controller).processSingleItem(0xc0000d8a50, {0x2d82c20, 0xc001c1d6e0})
	/home/runner/go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/controller.go:217 +0xeb
github.com/rancher/lasso/pkg/controller.(*controller).processNextWorkItem(0xc0000d8a50)
	/home/runner/go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/controller.go:194 +0x45
github.com/rancher/lasso/pkg/controller.(*controller).runWorker(...)
	/home/runner/go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/controller.go:183
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
	/home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:226 +0x33
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc001c1d7c0, {0x3a5a7c0, 0xc002000000}, 0x1, 0xc000115a40)
	/home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:227 +0xaf
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc001c1d7c0, 0x3b9aca00, 0x0, 0x1, 0xc000115a40)
	/home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:204 +0x7f
k8s.io/apimachinery/pkg/util/wait.Until(...)
	/home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:161
created by github.com/rancher/lasso/pkg/controller.(*controller).run in goroutine 85
	/home/runner/go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/controller.go:151 +0x2ba
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x2b33917]

goroutine 101 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc001c1d6e0?})
	/home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:56 +0xcd
panic({0x2e8c7e0?, 0x4daaed0?})
	/home/runner/go/pkg/mod/golang.org/[email protected]/src/runtime/panic.go:770 +0x132
github.com/rancher/eks-operator/pkg/eks.CreateNodeGroup({0x3a95118, 0xc0004fe000}, 0xc000d8efe0)
	/home/runner/work/eks-operator/eks-operator/pkg/eks/create.go:326 +0xa37
github.com/rancher/eks-operator/controller.(*Handler).updateUpstreamClusterState(0xc000143f40, {0x3a95118, 0xc0004fe000}, 0xc0008d01e0, 0xc0007ceb08, 0xc001e6ec00, {0xc00137b500, 0x3d}, 0xc000d8f840)
	/home/runner/work/eks-operator/eks-operator/controller/eks-cluster-config-handler.go:832 +0x13fa
github.com/rancher/eks-operator/controller.(*Handler).checkAndUpdate(0xc000143f40, {0x3a95118, 0xc0004fe000}, 0xc0007ceb08, 0xc001e6ec00)
	/home/runner/work/eks-operator/eks-operator/controller/eks-cluster-config-handler.go:319 +0xc12
github.com/rancher/eks-operator/controller.(*Handler).OnEksConfigChanged(0xc000143f40, {0x0?, 0x0?}, 0xc0007ceb08)
	/home/runner/work/eks-operator/eks-operator/controller/eks-cluster-config-handler.go:100 +0x2bc
github.com/rancher/eks-operator/controller.Register.(*Handler).recordError.func1({0xc000810a00?, 0x1a?}, 0xc0007ceb08?)
	/home/runner/work/eks-operator/eks-operator/controller/eks-cluster-config-handler.go:112 +0x37
github.com/rancher/wrangler/v3/pkg/generic.(*Controller[...].func1({0x3a7e7e8?, 0xc0007ceb08?})
	/home/runner/go/pkg/mod/github.com/rancher/wrangler/[email protected]/pkg/generic/controller.go:169 +0x44
github.com/rancher/lasso/pkg/controller.SharedControllerHandlerFunc.OnChange(0x0?, {0xc000810a00?, 0x0?}, {0x3a7e7e8?, 0xc0007ceb08?})
	/home/runner/go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/sharedcontroller.go:29 +0x32
github.com/rancher/lasso/pkg/controller.(*SharedHandler).OnChange(0xc000726960, {0xc000810a00, 0x1a}, {0x3a7e7e8, 0xc0007ceb08})
	/home/runner/go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/sharedhandler.go:75 +0x202
github.com/rancher/lasso/pkg/controller.(*controller).syncHandler(0xc0000d8a50, {0xc000810a00, 0x1a})
	/home/runner/go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/controller.go:236 +0x12e
github.com/rancher/lasso/pkg/controller.(*controller).processSingleItem(0xc0000d8a50, {0x2d82c20, 0xc001c1d6e0})
	/home/runner/go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/controller.go:217 +0xeb
github.com/rancher/lasso/pkg/controller.(*controller).processNextWorkItem(0xc0000d8a50)
	/home/runner/go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/controller.go:194 +0x45
github.com/rancher/lasso/pkg/controller.(*controller).runWorker(...)
	/home/runner/go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/controller.go:183
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
	/home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:226 +0x33
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc001c1d7c0, {0x3a5a7c0, 0xc002000000}, 0x1, 0xc000115a40)
	/home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:227 +0xaf
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc001c1d7c0, 0x3b9aca00, 0x0, 0x1, 0xc000115a40)
	/home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:204 +0x7f
k8s.io/apimachinery/pkg/util/wait.Until(...)
	/home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:161
created by github.com/rancher/lasso/pkg/controller.(*controller).run in goroutine 85
	/home/runner/go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/controller.go:151 +0x2ba

Business impact:

Customer can't access their EKS cluster in Rancher

Troubleshooting steps:

It seems like the problem is coming from here: https://github.com/rancher/eks-operator/blob/release-v2.9/pkg/eks/create.go#L326

Is that failing due to it not being able to delete a Launch Template?

We checked the Launch Template for the cluster that's referenced in the ekscc object for the cluster. The LT version it lists exists in AWS.

There were a couple of old ekscc objects that were previously deleted but hanging on finalizers. We cleared those out and it didn't help.

His EKS cluster also had a node group he tried to create as a test that had a space in it. We removed the space from the name and that still didn't help.

Please let us know what else we can look into.

PR's

@kkaempf kkaempf added kind/bug Something isn't working JIRA Must shout labels Dec 2, 2024
@kkaempf kkaempf moved this to Backlog in CAPI / Turtles Dec 2, 2024
@kkaempf kkaempf changed the title [SURE-9285] [SURE-9285] eks-config-operator crashing (go panic) on eks.CreateNodeGroup Dec 3, 2024
@mjura mjura moved this from Backlog to PR to be reviewed in CAPI / Turtles Dec 9, 2024
@mjura mjura closed this as completed Dec 9, 2024
@github-project-automation github-project-automation bot moved this from PR to be reviewed to Done in CAPI / Turtles Dec 9, 2024
@mjura mjura self-assigned this Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
JIRA Must shout kind/bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

2 participants