Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

Flux agent goroutine count and memory usage grows to problematic levels. #2263

Closed
bricef opened this issue Jul 16, 2019 · 5 comments
Closed
Labels
blocked-needs-validation Issue is waiting to be validated before we can proceed bug

Comments

@bricef
Copy link
Contributor

bricef commented Jul 16, 2019

Describe the bug

Recently after careful increase of workloads in a cluster with the weave-flux-agent installed, we find that the weave agent (through the Weave Cloud Deploy dashboard) is timing out or takes longer time to load. During these period, we noticed that the number of go-routines on the flux agent go high and also cause the node on which Flux is running to run out of resources. Could we please help troubleshoot and understand the cause of this issue ?

flux-leak

To Reproduce

  1. Install the latest version of the flux K8s agent (through Weave Cloud in this case).
  2. Increase the number of workloads
  3. Observe the flux agent golang Prometheus metrics on memory usage and goroutine count

Expected behaviour

I would not expect the memory of the fklux agent to grow beyond a reasonable level.

@bricef bricef added blocked-needs-validation Issue is waiting to be validated before we can proceed bug labels Jul 16, 2019
@bricef
Copy link
Contributor Author

bricef commented Jul 16, 2019

(NB: Further information to be provided by original reporter)

@hiddeco
Copy link
Member

hiddeco commented Jul 16, 2019

Is this screenshot from before or after +-08:45 UTC today?

@bricef
Copy link
Contributor Author

bricef commented Jul 16, 2019

@hiddeco before

@hiddeco
Copy link
Member

hiddeco commented Jul 16, 2019

@bricef Weave Cloud customers were upgraded to 1.13.2 today (new version range merged at 08:42 UTC).

The goroutines observed in this screenshot are probably due to the Kubernetes API rate-limiting requests, this will result in a pile of routines waiting for an answer and due to the deploy page making requests every x seconds, this pile will only grow. I implemented a timeout in #2171 (>=1.13.1) to prevent this from happening, so please check with the reporter if the issue is still relevant.

@bricef
Copy link
Contributor Author

bricef commented Jul 17, 2019

Thanks @hiddeco. Sounds like the changes worked and the customer is no longer experiencing these issues. I'll close this for now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
blocked-needs-validation Issue is waiting to be validated before we can proceed bug
Projects
None yet
Development

No branches or pull requests

2 participants