Multiple rule configuration sources #107

carlpett · 2018-08-30T08:31:19Z

As discussed briefly with @DirectXMan12 on Slack, we want/need to let multiple teams supply rules to the k8s-prometheus-adapter. The outcome of that discussion was to move configuration from the config file to a CRD.
I'd be happy to work on this, but it should probably be scoped first.

Thoughts and things to be considered?

DirectXMan12 · 2018-08-30T20:07:23Z

The Prometheus Operator could be a source of inspiration.

Things to consider:

Is there any way to "safely" allow normal users to specify rules without worrying about cross-namespace metrics reading? (probably would require query rewriting, or at least query parsing for verification :-/)
Do we want one object per "rule" (in the current config), or allow multiple rules in one object?

DirectXMan12 · 2018-08-30T20:07:31Z

@brancz might be interested in this.

tomkerkhove · 2018-08-31T06:58:34Z

We are considering the same for the Azure metrics adapter (Azure/azure-k8s-metrics-adapter#19), FYI @jsturtevant

carlpett · 2018-08-31T07:21:43Z

@DirectXMan12 Is the safety worry about reading other namespaces' "sensitive metrics", or about expensive queries? I would guess that those who take the decision to have a shared Prometheus installation would have some trust to the parties using it already (since they could already graph any such metrics). Expensive queries is a real problem, but I'm not sure if that is solvable? Are there other dimensions to this?

I would reflexively propose to allow a list of rules, but without any particular arguments other than not wanting to create dozens of them... Are there good arguments for going with single rules?

One potential issue I could see would be de-duplication of rules. Does it need to be handled? What do we do about duplicate names, for example? Esp if the name is the same but the other parameters different... Error out, warn and continue or ignore?

Re taking inspiration from the operator - I believe it predates the operator-framework. Should we prefer to go hand-crafted vs using the framework?

brancz · 2018-08-31T17:41:44Z

I'm not sure how I feel about custom rules for the Prometheus case. My feeling is that most useful rules will use recording rules anyways, so I'd prefer to explore how the adapter could work with multiple Prometheus instances, or maybe even more generic, explore how we could essentially federate metrics adapters so people can use mixed adapters in cluster, I imagine especially in multi-tenancy cases this will be very common. Essentially then everyone can setup their own adapters to their liking.

carlpett · 2018-08-31T19:37:42Z

I'm probably missing some context here, because I didn't really understand that :)
Why is useful rules (with recording rules?) something that works against, or is harmed by, allowing multiple users in the cluster to define custom metrics?
Also, why would useful rules mostly be based recording rules?

(That there are two or three(?) different meanings to "rules" does not help here 🙂)

brancz · 2018-08-31T19:54:38Z

(That there are two or three(?) different meanings to "rules" does not help here slightly_smiling_face)

Haha, definitely.

I can essentially imagine unlimited scenarios or queries users want to perform against Prometheus for autoscaling. While the rules of this adapter can help a bit doing this on the fly, they're very prone to requiring customization. Instead I think I would prefer if users just specify recording rules in Prometheus directly, making sure the resulting time-series has the appropriate labelling and then the arbitrarily complex query can be used through the adapter as it's accessible as if it was normal single metric/time-series.

Sorry if I'm not being detailed enough, I'm still recovering from GopherCon.

DirectXMan12 · 2018-09-24T16:00:08Z

I've been talking a bit offline with @brancz, and I'm think about switching to not having "discovery", and instead requiring manually specifying metrics and associations in CRD objects. Most people seem to care about a small subset of metrics anyway, and seem to mostly be using the custom rules to select one or two specific metrics, as opposed to all metrics in their cluster.

What do people think about that?

@aabed @zioproto @tanner-bruce @discordianfish @itskingori @atomaras you've all been vocal on stuff before

DirectXMan12 · 2018-09-24T17:33:21Z

@antoinne85 this may affect your work as well...

atomaras · 2018-09-24T17:42:28Z

From our perspective we have a limited set of custom metrics that we'd want to autoscale on. It is cpu, memory, request metrics and worker queue size metrics. We'd prefer manually specifying only the metrics we need.

itskingori · 2018-09-25T05:38:38Z

Most people seem to care about a small subset of metrics anyway, and seem to mostly be using the custom rules to select one or two specific metrics, as opposed to all metrics in their cluster.

@DirectXMan12 I agree with this statement. I started with the auto-discovery but realised that ultimately, what I want is the custom metrics I'm exporting to Prometheus. In my case I wanted to scale one component based on the web-server queue and another based on the worker queue. So I've manually specified the metrics I want.

I'd also like to point out that another motivation for this was that the prometheus adapter was using quite an amount of resources on all auto-discovered metrics. Limiting it to just what I need reduced resource usage significantly.

tomkerkhove · 2018-09-25T07:15:27Z

I agree with @DirectXMan12 & @itskingori - People are only interested in a segment of metrics and not all. Even if they are interested in all, that doesn't mean they should have access to them by default as well.

itskingori · 2018-09-25T10:33:54Z

I've been talking a bit offline with @brancz, and I'm think about switching to not having "discovery", and instead requiring manually specifying metrics and associations in CRD objects.

@DirectXMan12 I didn't comment on this part, but ... my thoughts are if moving to CRD objects can the reduce complexity of the config and simplify association, I'm all for that. This approach also means that one doesn't need to restart the adapter for it to get new configuration.

fejta-bot · 2021-01-30T07:31:29Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2021-03-01T08:16:57Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

fejta-bot · 2021-03-31T09:03:01Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

k8s-ci-robot · 2021-03-31T09:03:06Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

fredr · 2021-03-31T16:48:40Z

/reopen
/remove-lifecycle rotten

k8s-ci-robot · 2021-03-31T16:48:45Z

@fredr: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen
/remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

fredr · 2021-03-31T16:49:04Z

I think this is still a very interesting feature, and since noone seemed to be against it three years ago, I would like to reinitiate the discussion.

Having custom and external metrics via a CRDs would make it a lot easier to understand where what metrics are available where.
Three years later 😄 what are your opinions on this issue now?

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 30, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 1, 2021

k8s-ci-robot closed this as completed Mar 31, 2021

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Mar 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple rule configuration sources #107

Multiple rule configuration sources #107

carlpett commented Aug 30, 2018

DirectXMan12 commented Aug 30, 2018

DirectXMan12 commented Aug 30, 2018

tomkerkhove commented Aug 31, 2018

carlpett commented Aug 31, 2018

brancz commented Aug 31, 2018

carlpett commented Aug 31, 2018

brancz commented Aug 31, 2018 •

edited

Loading

DirectXMan12 commented Sep 24, 2018

DirectXMan12 commented Sep 24, 2018

atomaras commented Sep 24, 2018

itskingori commented Sep 25, 2018

tomkerkhove commented Sep 25, 2018

itskingori commented Sep 25, 2018

fejta-bot commented Jan 30, 2021

fejta-bot commented Mar 1, 2021

fejta-bot commented Mar 31, 2021

k8s-ci-robot commented Mar 31, 2021

fredr commented Mar 31, 2021

k8s-ci-robot commented Mar 31, 2021

fredr commented Mar 31, 2021 •

edited

Loading

Multiple rule configuration sources #107

Multiple rule configuration sources #107

Comments

carlpett commented Aug 30, 2018

DirectXMan12 commented Aug 30, 2018

DirectXMan12 commented Aug 30, 2018

tomkerkhove commented Aug 31, 2018

carlpett commented Aug 31, 2018

brancz commented Aug 31, 2018

carlpett commented Aug 31, 2018

brancz commented Aug 31, 2018 • edited Loading

DirectXMan12 commented Sep 24, 2018

DirectXMan12 commented Sep 24, 2018

atomaras commented Sep 24, 2018

itskingori commented Sep 25, 2018

tomkerkhove commented Sep 25, 2018

itskingori commented Sep 25, 2018

fejta-bot commented Jan 30, 2021

fejta-bot commented Mar 1, 2021

fejta-bot commented Mar 31, 2021

k8s-ci-robot commented Mar 31, 2021

fredr commented Mar 31, 2021

k8s-ci-robot commented Mar 31, 2021

fredr commented Mar 31, 2021 • edited Loading

brancz commented Aug 31, 2018 •

edited

Loading

fredr commented Mar 31, 2021 •

edited

Loading