Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] memory usage grows exponentially when there are lots of CRDs #923

Closed
Shaked opened this issue Mar 26, 2024 · 2 comments
Closed

[BUG] memory usage grows exponentially when there are lots of CRDs #923

Shaked opened this issue Mar 26, 2024 · 2 comments

Comments

@Shaked
Copy link

Shaked commented Mar 26, 2024

Hey folks,

I am running flux on a AKS cluster Server Version: v1.27.3 with:

  • 60 HelmReleases
  • 108 CRDs

I have been experiencing a memory issue with the helm controller, to the point where I faced OOMKilled a couple of times a day.

I have followed the advanced debugging instructions to profile the controller and got some interesting results:

(pprof) top10
Showing nodes accounting for 668.27MB, 89.98% of 742.69MB total
Dropped 280 nodes (cum <= 3.71MB)
Showing top 10 nodes out of 113
      flat  flat%   sum%        cum   cum%
  335.59MB 45.19% 45.19%   335.59MB 45.19%  reflect.New
  112.06MB 15.09% 60.28%   112.06MB 15.09%  google.golang.org/protobuf/internal/impl.consumeStringValidateUTF8
   82.07MB 11.05% 71.33%    82.07MB 11.05%  io.ReadAll
   41.01MB  5.52% 76.85%   105.53MB 14.21%  k8s.io/kube-openapi/pkg/util/proto.(*Definitions).parseKind
   20.50MB  2.76% 79.61%       29MB  3.91%  k8s.io/kube-openapi/pkg/util/proto.(*Definitions).parsePrimitive
   18.01MB  2.43% 82.03%    18.01MB  2.43%  github.com/go-openapi/swag.(*NameProvider).GetJSONNames
   17.50MB  2.36% 84.39%    33.01MB  4.44%  k8s.io/kube-openapi/pkg/util/proto.VendorExtensionToMap
      15MB  2.02% 86.41%       15MB  2.02%  google.golang.org/protobuf/internal/impl.consumeStringSliceValidateUTF8
   14.01MB  1.89% 88.30%    54.03MB  7.27%  k8s.io/kube-openapi/pkg/validation/spec.(*Schema).UnmarshalNextJSON
   12.51MB  1.68% 89.98%    12.51MB  1.68%  reflect.mapassign0

After posting this on the Slack channel, @stefanprodan suggested that it is related to the amount of CRDs (or their size), since Helm SDK uses all of the CRDs for discovery purposes and there's no way to disable that.

To test this issue, I have created fluxcd/flux-benchmark#4 which automatically installs N CRDs on a k8s cluster and runs the controller against it. While running this on my M2, I tried 500 CRDs with 100 HR and at some point I think I crossed 1 CPU. Managed to catch this screenshot:

image

I also ran this on a Azure AMD D2as_v5 node.

  • The yellow line shows a test with HR=100 and CRD=100, while limits where set to 1 CPU, 2Gi memory.
  • The next lines are from the same experiment, using HR=100 and CRD=150

image

Once I did this, I tried to increase the limits to 2 cpu and 2Gi memory, moved the helm-controller to a more powerful node (4vcpu/16Gb mem) and also made sure that the helm-controller doesn't share the same node as Prometheus/Grafana/cert-manager, the restarts count decreased but were still happening:

image
image

Currently I managed to stop the restarts by increasing the limits again using 2cpu and 3Gi memory.

While I think that removing the cpu limit might help, the origin of this issue is directly related to @stefanprodan suggestion regarding the Helm SDK and the way it uses the installed CRDs and how GC works.


Extra info

  • 2vcpu/8gb memory node
  • helm-controller limits:
limits:
  cpu: 1000m
  memory: 2Gi
  • Flux's extra arguments
--concurrent=10
--requeue-dependency=5s
  • Flux version
$ flux version
flux: v2.2.3
distribution: flux-v2.2.3
helm-controller: v0.37.4
kustomize-controller: v1.2.2
notification-controller: v1.2.4
source-controller: v1.2.4
@stefanprodan
Copy link
Member

stefanprodan commented Apr 12, 2024

So i did some digging into Helm SDK source code. The culprit seems to be the getCapabilities which invalidates the client CRD cache and then queries the Kubernetes API to get all CRDs. This function is called at upgrade, and to make things even worse, getCapabilities is called again in renderResources so basically all CRDs are loaded 4 times into memory for each upgrade: here and here.

This not only fills helm-controller memory, it also puts a huge pressure on Kubernetes API when running helm-controller with a high --concurrent number.

Not sure how this can be avoided and still have the Helm Capabilities feature working. I see that we could pass our own Capabilities, so maybe we could cache them globally in helm-controller and only refresh them when we install CRDs, but CRDs can also be in templates so we risk breaking Helm Capabilities and also the render logic which relies on the getCapabilities result...

@Shaked to validate my assumptions, you could modify helm-controller to load the Capabilities at startup only, then run your test and see if the memory usage drops.

@stefanprodan
Copy link
Member

Some good new 🎉 A combination of improvements in Flux 2.3 and Kubernetes API 1.29/1.30 make this issue less impactful.

Compared to Flux 2.2 and Kubernetes 1.28 where large number of CRDs would drive helm-controller into OOM, in Flux 2.3 and Kubernetes 1.29, even with 500 CRDs, helm-controller reconciles 1K HelmReleases in under 9 minutes when configured with concurrent 10, 2CPU and 1GB RAM limits. Benchmark results here: fluxcd/flux-benchmark#6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants