Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nifikop crashes with HPA Autoscaler Enabled #224

Closed
skpathak2 opened this issue Jan 13, 2023 · 2 comments
Closed

nifikop crashes with HPA Autoscaler Enabled #224

skpathak2 opened this issue Jan 13, 2023 · 2 comments
Labels
bug Something isn't working community

Comments

@skpathak2
Copy link

skpathak2 commented Jan 13, 2023

What steps will reproduce the bug?

  1. Environment
    Google Kubernetes Engine 1.24.7-gke.900
    Nifikop 1.0.0
    nificluster 1.17.0

  2. Assign node labels NifiCluster.spec.nodes

  nodes: 
    - id: 1
      labels:
        default-scale-group: "true"
      nodeConfigGroup: "default-group"
   - id: 2
      labels:
        default-scale-group: "true"
      nodeConfigGroup: "default-group"
  1. Enable HPA autoscaler
nodeGroupAutoscalers:
  - name: default-group-autoscaler
    enabled: true
    nodeConfigGroupId: default-group
    readOnlyConfig: {}
    nodeConfig: {}
    nodeLabelsSelector:
      matchLabels:
        default-scale-group: "true"
    upscaleStrategy: simple
    downscaleStrategy: lifo
    horizontalAutoscaler:
      maxReplicas: 6
      minReplicas: 1
      replicas: 1
      scaleTargetRef:
        apiVersion: nifi.konpyutaika.com/v1alpha1
        kind: NifiNodeGroupAutoscaler
        name: default-group-autoscaler
      metrics:
      - type: Resource
        resource:
          name: cpu
          target:
            type: Utilization
            averageUtilization: 1

What is the expected behavior?

Nifi nodes should autoscale seamlessly.
PS:- I have tried scaling up and down both yields same error.

What do you see instead?

nifikop operator fails with "msg":"Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference"

{"level":"info","time":"2023-01-13T02:48:16.888Z","logger":"controllers.NifiNodeGroupAutoscaler","caller":"controllers/nifinodegroupautoscaler_controller.go:148","msg":"Removing 2 nodes from cluster nifi-cluster spec.nodes configuration for node group default-group"}
{"level":"info","time":"2023-01-13T02:48:16.888Z","logger":"controllers.NifiNodeGroupAutoscaler","caller":"controllers/nifinodegroupautoscaler_controller.go:208","msg":"Using LIFO downscale strategy for cluster nifi-cluster node group default-group"}
{"level":"info","time":"2023-01-13T02:48:16.888Z","caller":"controller/controller.go:117","msg":"Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference","controller":"nifinodegroupautoscaler","controllerGroup":"nifi.konpyutaika.com","controllerKind":"NifiNodeGroupAutoscaler","nifiNodeGroupAutoscaler":{"name":"nifi-cluster-default-group","namespace":"nifi"},"namespace":"nifi","name":"nifi-cluster-default-group","reconcileID":"1645bb85-5ff9-47eb-99de-2ff3de7c2898"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
 panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xb65306]
goroutine 914 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:118 +0x1f4
panic({0x1716ce0, 0x274ec30})
 /usr/local/go/src/runtime/panic.go:884 +0x212
github.com/konpyutaika/nifikop/api/v1.(*NifiCluster).GetCreationTimeOrderedNodes(0xc0001a1400)
 /workspace/api/v1/nificluster_types.go:829 +0x126
github.com/konpyutaika/nifikop/pkg/autoscale.(*LIFOHorizontalDownscaleStrategy).ScaleDown(0xc0008a93c0, 0x2)
 /workspace/pkg/autoscale/strategy.go:36 +0x2c
github.com/konpyutaika/nifikop/controllers.(*NifiNodeGroupAutoscalerReconciler).scaleDown(0xc000a28540, 0xc000901ba0, 0xc0001a1400, 0x0?)
 /workspace/controllers/nifinodegroupautoscaler_controller.go:214 +0x139
github.com/konpyutaika/nifikop/controllers.(*NifiNodeGroupAutoscalerReconciler).Reconcile(0xc000a28540, {0x1bc2718, 0xc00090d380}, {{{0xc0006be980?, 0x10?}, {0xc0005ba440?, 0x40dae7?}}})
 /workspace/controllers/nifinodegroupautoscaler_controller.go:150 +0x785
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x1bc2670?, {0x1bc2718?, 0xc00090d380?}, {{{0xc0006be980?, 0x18525c0?}, {0xc0005ba440?, 0x4045d4?}}})
 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121 +0xc8
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000592be0, {0x1bc2670, 0xc00076cbc0}, {0x177e640?, 0xc000b45580?})
 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320 +0x33c
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000592be0, {0x1bc2670, 0xc00076cbc0})
 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273 +0x1d9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:230 +0x333

Possible solution

Seems like the error is stemming from GetCreationTimeOrderedNodes() method.

Error

{"level":"info","time":"2023-01-13T05:42:44.135Z","logger":"controllers.NifiNodeGroupAutoscaler","caller":"controllers/nifinodegroupautoscaler_controller.go:208","msg":"Using LIFO downscale strategy for cluster nifi-cluster node group nifi-cluster-default-group"}
{"level":"info","time":"2023-01-13T05:42:44.135Z","caller":"controller/controller.go:117","msg":"Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference","controller":"nifinodegroupautoscaler","controllerGroup":"nifi.konpyutaika.com","controllerKind":"NifiNodeGroupAutoscaler","nifiNodeGroupAutoscaler":{"name":"nifi-cluster-nifi-cluster-default-group","namespace":"nifi"},"namespace":"nifi","name":"nifi-cluster-nifi-cluster-default-group","reconcileID":"9667fe67-66b4-4dbd-be09-6605eba0c947"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
 panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xb65306]
goroutine 835 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:118 +0x1f4
panic({0x1716ce0, 0x274ec30})
 /usr/local/go/src/runtime/panic.go:884 +0x212
github.com/konpyutaika/nifikop/api/v1.(*NifiCluster).GetCreationTimeOrderedNodes(0xc00092ca00)
 /workspace/api/v1/nificluster_types.go:829 +0x126
github.com/konpyutaika/nifikop/pkg/autoscale.(*LIFOHorizontalDownscaleStrategy).ScaleDown(0xc000d0f3c0, 0x4)
 /workspace/pkg/autoscale/strategy.go:36 +0x2c
github.com/konpyutaika/nifikop/controllers.(*NifiNodeGroupAutoscalerReconciler).scaleDown(0xc0002a0540, 0xc000bb0b60, 0xc00092ca00, 0x0?)
 /workspace/controllers/nifinodegroupautoscaler_controller.go:214 +0x139
github.com/konpyutaika/nifikop/controllers.(*NifiNodeGroupAutoscalerReconciler).Reconcile(0xc0002a0540, {0x1bc2718, 0xc000a793b0}, {{{0xc000ac7da8?, 0x10?}, {0xc000bb2b10?, 0x40dae7?}}})
 /workspace/controllers/nifinodegroupautoscaler_controller.go:150 +0x785
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x1bc2670?, {0x1bc2718?, 0xc000a793b0?}, {{{0xc000ac7da8?, 0x18525c0?}, {0xc000bb2b10?, 0x4045d4?}}})
 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121 +0xc8
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc00015ad20, {0x1bc2670, 0xc000133e00}, {0x177e640?, 0xc0008615e0?})
 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320 +0x33c
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00015ad20, {0x1bc2670, 0xc000133e00})
 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273 +0x1d9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:230 +0x333

NiFiKop version

v.1.0.0

Golang version

go.1.19

Kubernetes version

1.24

NiFi version

1.17.0

Additional context

NA

@skpathak2 skpathak2 added bug Something isn't working community labels Jan 13, 2023
@skpathak2
Copy link
Author

Seems like the issue is with the helm chart as It's not possible to deploy a NifiCluster with only autoscaled node groups. The NifiCluster CRD requires that you specify at least one node in the spec.nodes list.

I disabled the autoscaling in the helm chart and did it manually using separate deployment. It worked flawlessly will orchestrate this using TF

@mh013370
Copy link
Member

mh013370 commented Jan 26, 2023

It just occurred to me that i mentioned this as a constraint when i raised the PR to contribute this feature: #89

It's not possible to deploy a NifiCluster with only autoscaled node groups. The NifiCluster CRD requires that you specify at least one node in the spec.nodes list. Do we want to support this? If so, we may need to adjust the cluster initialization logic in the NifiCluster controller.

This could be something we evaluate changing. At the very least this constraint needs to be in the documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working community
Projects
None yet
Development

No branches or pull requests

2 participants