-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
webhook_request_latencies_bucket metric keeps adding new data series and became unusable #3171
Comments
I can take a look at it if no one else is. But it may take a longer time for me. |
@imjasonh will this be suitable as a good first issue? |
/assign ywluogg cc @NavidZ since this relates to metrics |
Dropping this here for context. The Removing the labels in that pull request might help reduce the number of unique Aside from this, I don't know if theres a way to configure the metrics code to purge metrics from the in memory store after a period of time. This would help too. Most of the time, the in memory stuff is sent to a backend like Prometheus, stack driver, etc anyway. |
Issues go stale after 90d of inactivity. /lifecycle stale Send feedback to tektoncd/plumbing. |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. /lifecycle stale Send feedback to tektoncd/plumbing. |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. /lifecycle stale Send feedback to tektoncd/plumbing. |
/remove-lifecycle stale |
@ywluogg are you still looking into this? @vdemeester looks like this issue would be addressed by TEP-0073: Simplify metrics, right? |
Hi jerop@ I'm not looking into this anymore. Please unassign me. Thanks! |
Issues go stale after 90d of inactivity. /lifecycle stale Send feedback to tektoncd/plumbing. |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. /lifecycle stale Send feedback to tektoncd/plumbing. |
/assign @khrm |
@pritidesai This was fixed by knative/pkg#1464 So we can close this. /close |
@khrm: You can't close an active issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@khrm not only |
Issues go stale after 90d of inactivity. /lifecycle stale Send feedback to tektoncd/plumbing. |
This issue is still relevant. See this comment as well as this. |
To add some context, historically, `resource_name` was removed from this tag list due to its high potential of causing high metrics cardinality. See [knative#1464][1] for more information. While that's great, but it might not be sufficient for large scale use cases where namespaces can be super dynamic (with generateName, too) or grows fase enough. There is an issue report from [tektoncd/pipeline#3171][2] which talks about this. This proposal makes it possible to disable `resource_namespace` tag via an option function. The default behavior is not changed, so no user impact if any of existing users rely on this tag. There is no API contract change either due to the beauty of variadic functions. Now downstream projects can consume this by override `StatsReporter` in webhook context options with their own preference. As a caveat here, if downstream project does choose to override `StatsReporter`, the default `ReportMetrics` function shouldn't be called by default as they may now have a different set of tag keys to report. As such, this function is only called if the default `StatsReporter` is used. [1]: knative#1464 [2]: tektoncd/pipeline#3171
We have the same issue, too. I have a proposal for knative/pkg at knative/pkg#2931. |
To add some context, historically, `resource_name` was removed from this tag list due to its high potential of causing high metrics cardinality. See [knative#1464][1] for more information. While that's great, but it might not be sufficient for large scale use cases where namespaces can be super dynamic (with generateName, too) or grows fase enough. There is an issue report from [tektoncd/pipeline#3171][2] which talks about this. This proposal makes it possible to disable `resource_namespace` tag via an option function. The default behavior is not changed, so no user impact if any of existing users rely on this tag. There is no API contract change either due to the beauty of variadic functions. Now downstream projects can consume this by override `StatsReporter` in webhook context options with their own preference. As a caveat here, if downstream project does choose to override `StatsReporter`, the default `ReportMetrics` function shouldn't be called by default as they may now have a different set of tag keys to report. As such, this function is only called if the default `StatsReporter` is used. [1]: knative#1464 [2]: tektoncd/pipeline#3171
To add some context, historically, `resource_name` was removed from this tag list due to its high potential of causing high metrics cardinality. See [knative#1464][1] for more information. While that's great, but it might not be sufficient for large scale use cases where namespaces can be super dynamic (with generateName, too) or grows fase enough. There is an issue report from [tektoncd/pipeline#3171][2] which talks about this. This proposal makes it possible to disable `resource_namespace` tag via an option function. The default behavior is not changed, so no user impact if any of existing users rely on this tag. There is no API contract change either due to the beauty of variadic functions. Now downstream projects can consume this by override `StatsReporter` in webhook context options with their own preference. As a caveat here, if downstream project does choose to override `StatsReporter`, the default `ReportMetrics` function shouldn't be called by default as they may now have a different set of tag keys to report. As such, this function is only called if the default `StatsReporter` is used. [1]: knative#1464 [2]: tektoncd/pipeline#3171
To add some context, historically, `resource_name` was removed from this tag list due to its high potential of causing high metrics cardinality. See [knative#1464][1] for more information. While that's great, but it might not be sufficient for large scale use cases where namespaces can be super dynamic (with generateName, too) or grows fase enough. There is an issue report from [tektoncd/pipeline#3171][2] which talks about this. This proposal makes it possible to disable `resource_namespace` tag via an option function. The default behavior is not changed, so no user impact if any of existing users rely on this tag. There is no API contract change either due to the beauty of variadic functions. Now downstream projects can consume this by override `StatsReporter` in webhook context options with their own preference. As a caveat here, if downstream project does choose to override `StatsReporter`, the default `ReportMetrics` function shouldn't be called by default as they may now have a different set of tag keys to report. As such, this function is only called if the default `StatsReporter` is used. [1]: knative#1464 [2]: tektoncd/pipeline#3171
To add some context, historically, `resource_name` was removed from this tag list due to its high potential of causing high metrics cardinality. See [knative#1464][1] for more information. While that's great, but it might not be sufficient for large scale use cases where namespaces can be super dynamic (with generateName, too) or grows fase enough. There is an issue report from [tektoncd/pipeline#3171][2] which talks about this. This proposal makes it possible to disable `resource_namespace` tag via an option function. The default behavior is not changed, so no user impact if any of existing users rely on this tag. There is no API contract change either due to the beauty of variadic functions. Now downstream projects can consume this by override `StatsReporter` in webhook context options with their own preference. As a caveat here, if downstream project does choose to override `StatsReporter`, the default `ReportMetrics` function shouldn't be called by default as they may now have a different set of tag keys to report. As such, this function is only called if the default `StatsReporter` is used. [1]: knative#1464 [2]: tektoncd/pipeline#3171
To add some context, historically, `resource_name` was removed from this tag list due to its high potential of causing high metrics cardinality. See [knative#1464][1] for more information. While that's great, but it might not be sufficient for large scale use cases where namespaces can be super dynamic (with generateName, too) or grows fase enough. There is an issue report from [tektoncd/pipeline#3171][2] which talks about this. This proposal makes it possible to disable `resource_namespace` tag via an option function. The default behavior is not changed, so no user impact if any of existing users rely on this tag. There is no API contract change either due to the beauty of variadic functions. Now downstream projects can consume this by override `StatsReporter` in webhook context options with their own preference. As a caveat here, if downstream project does choose to override `StatsReporter`, the default `ReportMetrics` function shouldn't be called by default as they may now have a different set of tag keys to report. As such, this function is only called if the default `StatsReporter` is used. [1]: knative#1464 [2]: tektoncd/pipeline#3171
To add some context, historically, `resource_name` was removed from this tag list due to its high potential of causing high metrics cardinality. See [knative#1464][1] for more information. While that's great, but it might not be sufficient for large scale use cases where namespaces can be super dynamic (with generateName, too) or grows fase enough. There is an issue report from [tektoncd/pipeline#3171][2] which talks about this. This proposal makes it possible to disable `resource_namespace` tag via an option function. The default behavior is not changed, so no user impact if any of existing users rely on this tag. There is no API contract change either due to the beauty of variadic functions. Now downstream projects can consume this by override `StatsReporter` in webhook context options with their own preference. As a caveat here, if downstream project does choose to override `StatsReporter`, the default `ReportMetrics` function shouldn't be called by default as they may now have a different set of tag keys to report. As such, this function is only called if the default `StatsReporter` is used. [1]: knative#1464 [2]: tektoncd/pipeline#3171
To add some context, historically, `resource_name` was removed from this tag list due to its high potential of causing high metrics cardinality. See [knative#1464][1] for more information. While that's great, but it might not be sufficient for large scale use cases where namespaces can be super dynamic (with generateName, too) or grows fase enough. There is an issue report from [tektoncd/pipeline#3171][2] which talks about this. This proposal makes it possible to disable `resource_namespace` tag via an option function. The default behavior is not changed, so no user impact if any of existing users rely on this tag. There is no API contract change either due to the beauty of variadic functions. Now downstream projects can consume this by override `StatsReporter` in webhook context options with their own preference. As a caveat here, if downstream project does choose to override `StatsReporter`, the default `ReportMetrics` function shouldn't be called by default as they may now have a different set of tag keys to report. As such, this function is only called if the default `StatsReporter` is used. [1]: knative#1464 [2]: tektoncd/pipeline#3171
To add some context, historically, `resource_name` was removed from this tag list due to its high potential of causing high metrics cardinality. See [knative#1464][1] for more information. While that's great, but it might not be sufficient for large scale use cases where namespaces can be super dynamic (with generateName, too) or grows fase enough. There is an issue report from [tektoncd/pipeline#3171][2] which talks about this. This proposal makes it possible to disable `resource_namespace` tag via an option function. The default behavior is not changed, so no user impact if any of existing users rely on this tag. There is no API contract change either due to the beauty of variadic functions. Now downstream projects can consume this by override `StatsReporter` in webhook context options with their own preference. As a caveat here, if downstream project does choose to override `StatsReporter`, the default `ReportMetrics` function shouldn't be called by default as they may now have a different set of tag keys to report. As such, this function is only called if the default `StatsReporter` is used. [1]: knative#1464 [2]: tektoncd/pipeline#3171
) * webhook: add options to disable resource_namespace tag in metrics To add some context, historically, `resource_name` was removed from this tag list due to its high potential of causing high metrics cardinality. See [#1464][1] for more information. While that's great, but it might not be sufficient for large scale use cases where namespaces can be super dynamic (with generateName, too) or grows fase enough. There is an issue report from [tektoncd/pipeline#3171][2] which talks about this. This proposal makes it possible to disable `resource_namespace` tag via an option function. The default behavior is not changed, so no user impact if any of existing users rely on this tag. There is no API contract change either due to the beauty of variadic functions. Now downstream projects can consume this by override `StatsReporter` in webhook context options with their own preference. As a caveat here, if downstream project does choose to override `StatsReporter`, the default `ReportMetrics` function shouldn't be called by default as they may now have a different set of tag keys to report. As such, this function is only called if the default `StatsReporter` is used. [1]: #1464 [2]: tektoncd/pipeline#3171 * webhook: add StatsReporterOptions in webhook.Options There are two ways to customize StatsReporter: 1. Use a whole new StatsReporter implementation. 1. Or pass Option funcs to customize the default StatsReporter. Option 1 is less practical at this time due to the metrics registration conflict. `webhook.RegisterMetrics()` is called regardless which StatsReporter implementation is used (which is a problem by itself). The second option is more practical since it works well without dealing with metrics conflicts. The `webhook.Option` in particular allows people to discard certain metrics tags.
knative/pkg now gives the option to exclude arbitrary tags. I assume the next action item is to bump knative/pkg and customize the webhook options. |
@khrm are you still working on this issue? We marked it as "blocking" for a v1 release and we would like to make a v1 release in July. If you are working on it, will you be able to solve this until then? Thank you! |
@afrittoli This seems to be resolved. Will be fixed by knative update. |
Do you know when this fix is expected to be part of Tekton? |
FWIW, besides bumping |
This addresses tektoncd#3171. As `resource_namespace` has the potential to contribute to high metrics cardinality while not adding much value from observability perspective, it would be ideal to disable it by default.
Expected Behavior
Prometheus metric
webhook_request_latencies_bucket
is usable in real environment, don't add new data series forever. Prometheus is able to query that metric.Actual Behavior
Prometheus metric
webhook_request_latencies_bucket
creates so many data series that is practically impossible to query in prometheus (too much data). It keeps adding new series while it's running so number of series increase forever. Restart podtekton-pipelines-webhook
resets number of series and fixes issue.Steps to Reproduce the Problem
Run
tekton-pipelines-webhook
.Additional Info
The text was updated successfully, but these errors were encountered: