-
Notifications
You must be signed in to change notification settings - Fork 557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple rule configuration sources #107
Comments
The Prometheus Operator could be a source of inspiration. Things to consider:
|
@brancz might be interested in this. |
We are considering the same for the Azure metrics adapter (Azure/azure-k8s-metrics-adapter#19), FYI @jsturtevant |
@DirectXMan12 Is the safety worry about reading other namespaces' "sensitive metrics", or about expensive queries? I would guess that those who take the decision to have a shared Prometheus installation would have some trust to the parties using it already (since they could already graph any such metrics). Expensive queries is a real problem, but I'm not sure if that is solvable? Are there other dimensions to this? I would reflexively propose to allow a list of rules, but without any particular arguments other than not wanting to create dozens of them... Are there good arguments for going with single rules? One potential issue I could see would be de-duplication of rules. Does it need to be handled? What do we do about duplicate names, for example? Esp if the name is the same but the other parameters different... Error out, warn and continue or ignore? Re taking inspiration from the operator - I believe it predates the operator-framework. Should we prefer to go hand-crafted vs using the framework? |
I'm not sure how I feel about custom rules for the Prometheus case. My feeling is that most useful rules will use recording rules anyways, so I'd prefer to explore how the adapter could work with multiple Prometheus instances, or maybe even more generic, explore how we could essentially federate metrics adapters so people can use mixed adapters in cluster, I imagine especially in multi-tenancy cases this will be very common. Essentially then everyone can setup their own adapters to their liking. |
I'm probably missing some context here, because I didn't really understand that :) (That there are two or three(?) different meanings to "rules" does not help here 🙂) |
Haha, definitely. I can essentially imagine unlimited scenarios or queries users want to perform against Prometheus for autoscaling. While the rules of this adapter can help a bit doing this on the fly, they're very prone to requiring customization. Instead I think I would prefer if users just specify recording rules in Prometheus directly, making sure the resulting time-series has the appropriate labelling and then the arbitrarily complex query can be used through the adapter as it's accessible as if it was normal single metric/time-series. Sorry if I'm not being detailed enough, I'm still recovering from GopherCon. |
I've been talking a bit offline with @brancz, and I'm think about switching to not having "discovery", and instead requiring manually specifying metrics and associations in CRD objects. Most people seem to care about a small subset of metrics anyway, and seem to mostly be using the custom rules to select one or two specific metrics, as opposed to all metrics in their cluster. What do people think about that? @aabed @zioproto @tanner-bruce @discordianfish @itskingori @atomaras you've all been vocal on stuff before |
@antoinne85 this may affect your work as well... |
From our perspective we have a limited set of custom metrics that we'd want to autoscale on. It is cpu, memory, request metrics and worker queue size metrics. We'd prefer manually specifying only the metrics we need. |
@DirectXMan12 I agree with this statement. I started with the auto-discovery but realised that ultimately, what I want is the custom metrics I'm exporting to Prometheus. In my case I wanted to scale one component based on the web-server queue and another based on the worker queue. So I've manually specified the metrics I want. I'd also like to point out that another motivation for this was that the prometheus adapter was using quite an amount of resources on all auto-discovered metrics. Limiting it to just what I need reduced resource usage significantly. |
I agree with @DirectXMan12 & @itskingori - People are only interested in a segment of metrics and not all. Even if they are interested in all, that doesn't mean they should have access to them by default as well. |
@DirectXMan12 I didn't comment on this part, but ... my thoughts are if moving to CRD objects can the reduce complexity of the config and simplify association, I'm all for that. This approach also means that one doesn't need to restart the adapter for it to get new configuration. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
Rotten issues close after 30d of inactivity. Send feedback to sig-contributor-experience at kubernetes/community. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@fredr: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I think this is still a very interesting feature, and since noone seemed to be against it three years ago, I would like to reinitiate the discussion. Having custom and external metrics via a CRDs would make it a lot easier to understand where what metrics are available where. |
As discussed briefly with @DirectXMan12 on Slack, we want/need to let multiple teams supply rules to the k8s-prometheus-adapter. The outcome of that discussion was to move configuration from the config file to a CRD.
I'd be happy to work on this, but it should probably be scoped first.
Thoughts and things to be considered?
The text was updated successfully, but these errors were encountered: