-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add timeseries.instance
to allow automatic downsampling on query
#10293
Conversation
thanks @exekias! important infrastructure in order to get better operational metrics graphing behavior in place. For example, the WIP metrics explorer will be able to use this under the hood. |
@@ -6,6 +6,7 @@ | |||
fields: | |||
- name: id | |||
type: long | |||
dimension: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an example where flagging the field as dimension
was required. id
should actually be a keyword
here, but we may find other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on making it possible to configure fields as dimensions.
c5a3f3f
to
9d8093e
Compare
Great, happy to see that we are in the way to have downsampling 😃
What is the plan for this? Will there be specific queries? or this info will be used by visualizations?
Should we make this aggregation configurable too?
This makes sense, I wonder if having too many dimensions can hit performance. |
@@ -34,3 +34,8 @@ | |||
type: alias | |||
path: agent.hostname | |||
migration: true | |||
|
|||
- name: timeseries.instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitively a field we should have in ECS.
@exekias Could you open an issue for this in the ECS repo. Their is probably also a good place for naming discussions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good to me, would be great to have it documented there, will open as soon as we know we will move this forward 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we already have an issue for it? Could not find it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the TODO on this PR :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ups, didn't go through the description again 🤦♂️
My hope is that the metrics explorer can make use of this field and do it transparently to the user, so they don't really need to care about it.
Maybe?, I think that discussion is better in the UI issue, once we start working on it elastic/kibana#28027
I think the number of dimensions won't hurt performance, it's the number of time series what may do it. That means that if you do a query that aggregates too many lines, performance may be affected, but I don't think we have a better way without doing changes in Elasticsearch. |
785de6f
to
7c8569c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ on the concept of the timeseries id.
I assume just setting type: keyword
as dimension will not work as there are quite a few fields which are keyword but are not related to the dimension. For example service.version
is probably not needed for the dimension. I would start with manual configuration.
Questions:
- I would think one of the basic dimentions is the
event.dataset
field. - For prometheus and others, I assume the labels are defining the dimension
- This dimension configs should be identical to what we need for rollups I assume?
} | ||
} | ||
|
||
h, err := hashstructure.Hash(instanceFields, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if instead of making it a hash we should use the full readable string as a keyword. It would mean concatenating all the field/value pairs together as a keyword. This would make the id at least half readable on what it is which could be useful for debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered this option, like maybe just putting in the JSON for all KVs. The problem I see with this string may be too long and pollute the document, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you mentioned JSON it somehow reminded me of elastic/elasticsearch#25312 but I don't think it would apply here as there is not such thing like "group by all these fields".
I'm less worried about polluting the document then about the size of it and if it would have some effect on the overall size. Probably not as the same would show up many times and potentially it even compresses better then a unique hash value. The reason I'm not concerned about polluting the document is that I do not expect that users often look at the raw document and mainly on the metrics. For debugging purpose I definitively think it would be beneficial to have the full string. In case things don't work as expected we would directly see what things are grouped by.
@@ -6,6 +6,7 @@ | |||
fields: | |||
- name: id | |||
type: long | |||
dimension: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on making it possible to configure fields as dimensions.
Meant to comment on this earlier but got distracted. I had some concerns about the hashes but @exekias spent a long time chatting with me and I think my concerns were addressed/not real (thanks @exekias!). I think the only major point is that the nature of the hashes means you'll only be able to use the hashes if you always want to downsample the set of tuples as an individual time series first, before any other action. What I mean is that if you have the tuples From my conversation with @exekias I think that's ok, since in this context it won't make sense to combine the datapoints from † Or rather, if you want these tuples you'll have to ignore the hash and fall back to a full set of Scripting The only potential suggestion I had is that this hashing behavior could be replicated with a script. E.g. instead pre-hashing the values into a discrete field, the hashes could be generated at runtime with a script in the The downside is script execution at runtime, although that should be relatively minor nowadays with painless. |
this came up recently with regards to metrics coming in from APM agents. The solution suggested here should also be applied in APM Server to add the
certainly an option, but you'd have to loop over the key/values in alphabetic order and concatenate "=" for each key/value pair. You'd also have to filter out numeric values. |
@ruflin asked:
Yes. We could make it a "dimension"
Sounds good to me.
I think the story around how rollups relate to this needs some more thought. For example, for counters, roll-ups will need to take the derivative per Something else: this could probably be experimental and opt-in initially, and then enabled by default in a later version. There might be performance implications in generating the hash that we didn't consider. |
@roncohen Definitively like the idea to start this as opt-in so we can start playing around with it. |
508b727
to
8238b29
Compare
119635a
to
9f21251
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for adding tests.
I have only some doubts on the support of objects with wildcards, but supporting this can be left for a future change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. Looking forward to have this in so we can start to play around with it.
|
||
[source,yaml] | ||
---- | ||
metricbeat.max_start_delay: 10s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks off?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed, fixed it
…lastic#10293) * Report time series data to help Kibana query metrics
This PR introduces a new field that will be attached to events carrying time series:
timeseries.instance
. The objective of it is to allow queries to downsample metrics from multiple time series while consuming them.The problem
When querying for time series, bucket size is important, as you may have several occurrences of a metric in the same bucket. When doing aggregations on them, this is something that needs to be taken into account.
Example
With these values for
kubernetes.container.memory.usage.bytes
:A query like "sum of containers memory usage per host" goes:
Containers are double counted as there are more than one value per bucket for the same container. We need a way to overcome this.
You can test this query at demo.elastic.co, change the time range to > 4h to see how data goes wrong (bucket size went about 10s)
TIme series instance id
This PR introduces a
timeseries.instance
field for Metricbeat documents (we should discuss the need for this in other beats). This field contains a hash of all the k-v dimensions in the document. Dimensions are the set of fields whose combination defines a time series. In this example, dimensions forkubernetes.container.memory.usage.bytes
arehost
andcontainer
. This means, each container in the returned documents will have a different and consistenttimeseries.instance
. All values for the same container will share the sametimeseries.instance
.We will then be able to use this information to downsample values in buckets and make sure we only have one per time series before the main aggregation is executed.
Following the previous example, the query would do an average of all values per time series & bucket:
and then execute the main aggregation (
sum
):I'm currently treating any keyword field as a dimension, we may need to fine tune this, by enabling/disabling them in
fields.yml
, but seems to be a good heuristic to start with.This feature is flagged as experimental, to enable it users need to put this in
metricbeat.yml
: