ScaledJob: index out of range with one broken External scaler #1669

fjmacagno · 2021-03-12T22:59:22Z

Report

When the KEDA operator tries to use a ScaledJob which has a misconfigured scaler address (though it may be any error with a ScaledJob), and there is only one ScaledJob, the operator will crash with the error panic: runtime error: index out of range [0] with length 0.

Expected Behavior

Operator logs error and continues without crashing.

Actual Behavior

Operator crashes.

Steps to Reproduce the Problem

Deploy a ScaledJob with an incorrect scaler address.

Logs from KEDA operator

I0312 21:33:59.840327       1 request.go:655] Throttling request took 1.042572281s, request: GET:https://172.20.0.1:443/apis/autoscaling/v2beta1?timeout=32s
2021-03-12T21:33:59.843Z	INFO	controller-runtime.metrics	metrics server is starting to listen	{"addr": ":8080"}
2021-03-12T21:33:59.844Z	INFO	controllers.ScaledObject	Running on Kubernetes 1.18+	{"version": "v1.18.9-eks-d1db3c"}
2021-03-12T21:33:59.845Z	INFO	setup	Starting manager
2021-03-12T21:33:59.845Z	INFO	setup	KEDA Version: 2.1.0
2021-03-12T21:33:59.845Z	INFO	setup	Git Commit: 4866ce69c4897df532b43390bafe4477275bf65a
2021-03-12T21:33:59.845Z	INFO	setup	Go Version: go1.15.6
2021-03-12T21:33:59.845Z	INFO	setup	Go OS/Arch: linux/amd64
I0312 21:33:59.845296       1 leaderelection.go:243] attempting to acquire leader lease keda/operator.keda.sh...
2021-03-12T21:33:59.845Z	INFO	controller-runtime.manager	starting metrics server	{"path": "/metrics"}
I0312 21:34:17.252946       1 leaderelection.go:253] successfully acquired lease keda/operator.keda.sh
2021-03-12T21:34:17.253Z	INFO	controller	Starting EventSource	{"reconcilerGroup": "keda.sh", "reconcilerKind": "ScaledObject", "controller": "scaledobject", "source": "kind source: /, Kind="}
2021-03-12T21:34:17.253Z	INFO	controller	Starting EventSource	{"reconcilerGroup": "keda.sh", "reconcilerKind": "ScaledJob", "controller": "scaledjob", "source": "kind source: /, Kind="}
2021-03-12T21:34:17.353Z	INFO	controller	Starting Controller	{"reconcilerGroup": "keda.sh", "reconcilerKind": "ScaledJob", "controller": "scaledjob"}
2021-03-12T21:34:17.353Z	INFO	controller	Starting EventSource	{"reconcilerGroup": "keda.sh", "reconcilerKind": "ScaledObject", "controller": "scaledobject", "source": "kind source: /, Kind="}
2021-03-12T21:34:17.353Z	INFO	controller	Starting workers	{"reconcilerGroup": "keda.sh", "reconcilerKind": "ScaledJob", "controller": "scaledjob", "worker count": 1}
2021-03-12T21:34:17.353Z	INFO	controllers.ScaledJob	Reconciling ScaledJob	{"ScaledJob.Namespace": "vineyard", "ScaledJob.Name": "vineyard-consumer-test-resource"}
2021-03-12T21:34:17.353Z	INFO	controllers.ScaledJob	Detected ScaleType = Job	{"ScaledJob.Namespace": "vineyard", "ScaledJob.Name": "vineyard-consumer-test-resource"}
2021-03-12T21:34:17.453Z	INFO	controller	Starting Controller	{"reconcilerGroup": "keda.sh", "reconcilerKind": "ScaledObject", "controller": "scaledobject"}
2021-03-12T21:34:17.453Z	INFO	controller	Starting workers	{"reconcilerGroup": "keda.sh", "reconcilerKind": "ScaledObject", "controller": "scaledobject", "worker count": 1}
2021-03-12T21:34:17.453Z	INFO	controllers.ScaledJob	Deleting jobs owned by the previous version of the scaledJob	{"ScaledJob.Namespace": "vineyard", "ScaledJob.Name": "vineyard-consumer-test-resource", "Number of jobs to delete": 8}
2021-03-12T21:34:17.453Z	INFO	controllers.ScaledJob	Initializing Scaling logic according to ScaledObject Specification	{"ScaledJob.Namespace": "vineyard", "ScaledJob.Name": "vineyard-consumer-test-resource"}
2021-03-12T21:34:17.620Z	ERROR	external_scaler	error	{"error": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp: lookup vineyard-scaler-service.default.svc.cluster.local on 172.20.0.10:53: no such host\""}
github.com/go-logr/zapr.(*zapLogger).Error
	/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:132
github.com/kedacore/keda/v2/pkg/scalers.(*externalScaler).GetMetricSpecForScaling
	/workspace/pkg/scalers/external_scaler.go:152
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScaledJobScalers
	/workspace/pkg/scaling/scale_handler.go:236
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
	/workspace/pkg/scaling/scale_handler.go:198
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
	/workspace/pkg/scaling/scale_handler.go:130
panic: runtime error: index out of range [0] with length 0
goroutine 353 [running]:
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScaledJobScalers(0xc000bd2940, 0x2a098a0, 0xc00004cf80, 0xc000db7a40, 0x1, 0x1, 0xc000bfa380, 0x0, 0xc000256780, 0x0)
	/workspace/pkg/scaling/scale_handler.go:238 +0x99f
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers(0xc000bd2940, 0x2a098a0, 0xc00004cf80, 0x2563f20, 0xc000bfa380, 0x29d4f20, 0xc000518ef8)
	/workspace/pkg/scaling/scale_handler.go:198 +0x23f
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop(0xc000bd2940, 0x2a098a0, 0xc00004cf80, 0xc0004fe500, 0x2563f20, 0xc000bfa380, 0x29d4f20, 0xc000518ef8)
	/workspace/pkg/scaling/scale_handler.go:130 +0x205
created by github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).HandleScalableObject
	/workspace/pkg/scaling/scale_handler.go:98 +0x385

KEDA Version

2.1.0

Kubernetes Version

1.18

Platform

Amazon Web Services

Scaler Details

Custom

The text was updated successfully, but these errors were encountered:

zroubalik · 2021-03-15T14:09:30Z

@fjmacagno This is happening only if you use ExternalScaler, right?

fjmacagno · 2021-03-15T16:35:39Z

Yes

coderanger · 2021-03-15T19:10:42Z

Unfortunately this is not limited to any one scaler. There's another issue open for it already but its a general thing. I'm 99% sure it's because we don't deepcopy the objects before launching the background scaler goroutines but haven't had cycles to check because work crunch time. Happens about once an hour to me using the rabbit scaler.

zroubalik · 2021-03-15T19:43:58Z

This is not the same issue, this is related to ScaledJob, when there is one External Scaler with incorrect metadata. Then there s a problem when returning metricSpecs, which are not returned and this could result in Operator panic.

fjmacagno added the bug Something isn't working label Mar 12, 2021

zroubalik self-assigned this Mar 15, 2021

zroubalik added this to the v2.2 milestone Mar 15, 2021

zroubalik changed the title ~~index out of range with one broken scaler~~ ScaledJob: index out of range with one broken External scaler Mar 15, 2021

zroubalik mentioned this issue Mar 15, 2021

Fixing behavior on ScaledJob with incorrect External Scaler #1672

Merged

2 tasks

ahmelsayed closed this as completed in #1672 Mar 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ScaledJob: index out of range with one broken External scaler #1669

ScaledJob: index out of range with one broken External scaler #1669

fjmacagno commented Mar 12, 2021 •

edited

Loading

zroubalik commented Mar 15, 2021

fjmacagno commented Mar 15, 2021

coderanger commented Mar 15, 2021

zroubalik commented Mar 15, 2021

ScaledJob: index out of range with one broken External scaler #1669

ScaledJob: index out of range with one broken External scaler #1669

Comments

fjmacagno commented Mar 12, 2021 • edited Loading

Report

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Logs from KEDA operator

KEDA Version

Kubernetes Version

Platform

Scaler Details

zroubalik commented Mar 15, 2021

fjmacagno commented Mar 15, 2021

coderanger commented Mar 15, 2021

zroubalik commented Mar 15, 2021

fjmacagno commented Mar 12, 2021 •

edited

Loading