Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workqueue depth and duration metrics are not accurate #674

Open
Tracked by #4620
turkenh opened this issue Feb 23, 2024 · 3 comments
Open
Tracked by #4620

Workqueue depth and duration metrics are not accurate #674

turkenh opened this issue Feb 23, 2024 · 3 comments
Labels

Comments

@turkenh
Copy link
Member

turkenh commented Feb 23, 2024

What happened?

During scalability testing efforts leading crossplane-contrib/provider-kubernetes#203, I noticed that the controller runtime metrics like workqueue_depth and workqueue_queue_duration_seconds are not accurate or not reflecting the state of the system as expected.

See the workqueue depth and duration graphs for "1m, 10" here.

How can we reproduce it?

Checkout https://github.com/turkenh/provider-kubernetes-scalability/tree/repro-xp-no-metric

just setup
just create_x_objects 1 1000
just help_launch_prometheus
just help_launch_grafana
# import dashboard json there

What environment did it happen in?

Crossplane version: v1.14.5
Provider Kubernetes: v0.11.4

@turkenh turkenh added the bug Something isn't working label Feb 23, 2024
@negz
Copy link
Member

negz commented Feb 26, 2024

@turkenh Do you have any theories on why depth is inaccurate? I think per crossplane/crossplane#5415 (comment) we have a good theory why duration would be inaccurate.

@negz
Copy link
Member

negz commented Feb 26, 2024

Ah - maybe rate limiters calling AddAfter (as in "add to queue after some duration") means reconciles are spending time in limbo. They're not yet being processed, but also not yet even in the work queue?

Copy link

github-actions bot commented Sep 3, 2024

Crossplane does not currently have enough maintainers to address every issue and pull request. This issue has been automatically marked as stale because it has had no activity in the last 90 days. It will be closed in 14 days if no further activity occurs. Leaving a comment starting with /fresh will mark this issue as not stale.

@github-actions github-actions bot added the stale label Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants