-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support metrics with dots in their names #347
Comments
The server team discussed this just now, and we feel like option 2 is too fragile, since it only works if the metrics are all part of the same metric set. If for example an agent were to sometimes omit (delta) metrics that did not change in the most recent interval, then that could mean there is no conflict and the name would flip. That would then lead to an indexing conflict. In the shorter term, could the Java agent rewrite the specific problematic hikaricp metric (e.g. hikaricp.connections -> hikaricp.connections.total)? It wouldn't necessarily need to be configurable. This obviously isn't perfect, but it would resolve the immediate issue. The server team will also look into our options going forward, and raise enhancement requests with the Elasticsearch team. |
I learned yesterday of elastic/kibana#69908. This relies on index patterns and therefore isn't immediately useful for dynamic fields, but it may also be possible to introduce a I'm not sure if/how we could dynamically set that though; this doesn't fit with the current proposal for dynamic mapping type hints: elastic/elasticsearch#61939
I assume you mean that we would rename things like process, runtime, etc. metrics? We could treat these well-defined metrics differently. In a way they are already treated differently; they have explicit field mappings. |
While option 2 is less fragile when done in the agent and not on the server, it's also not perfect. When adding a custom metric To fix that, users have to delete the metric index and rename the The only option I see to fix that in a systematic way is to disallow/replace dots for custom metrics.
|
I agree that's the best option for now. There's a new Elasticsearch enhancement request to support dotted field names at: elastic/elasticsearch#63530 |
Since the underlying issue is with Elasticsearch, it might take a while before we can rely on a proper fix for that. The current workaround requires to use disable_metrics: Because option 2 (add suffix automatically) is not reliable, we won't explore it further. Thus, we are left with three options:
Do we agree on:
|
Since the optimal solution would be through the implementation of elastic/elasticsearch#63530, and we will eventually want to rely on it, maybe trying to get information on when this is expected can help. |
Another occurrence at https://discuss.elastic.co/t/hikaricp-metrics-ingestion-error/259875 |
Also, the latest update on the underlying issue refers to flattened fields, which is currently WIP, thus we can expect to have good news on the topic soon elastic/elasticsearch#63530 (comment) 🤞 ! |
Did this get fixed and released at some point? I see some merged PRs. |
@OrangeDog at least from the latest updates, we have an option for the APM Java Agent which will, by default, dedots the metrics.
The solution would be to rely on Elasticsearch feature request elastic/elasticsearch#63530 (comment), which is not yet ready. |
Hello, I am reaching out to let you know that dedotting will no longer be needed from Elasticsearch 8.3 onwards. See elastic/elasticsearch#86166 . Any object can be configured in the mappings to not hold any further subobjects in which case dots in field names will be left as-is so that |
@axw is that all we need in order to support dots in names? I'm not so sure. In contrast to the examples in elastic/elasticsearch#86166, we don't have a @javanna for TSDB, is there a plan to automatically map metric fields (those that are not dimensions) in a way so that they are never objects? Something like |
@felixbarny I'm also not sure. If we have to add a In elastic/elasticsearch#63530 (comment) it was suggested that we would be able to set
This is what I was originally hoping for, and what I was describing in elastic/elasticsearch#63530 (comment). @romseygeek made the point that it doesn't really matter whether they're flattened or not, unless we specifically care about the |
Right, I think we'd be fine even with metadata fields, such as |
Heya, happy that I have gotten your attention, there was some confusion in the discussion in the ticket mentioned above. Can I get an example document where you need to use this feature so I can better understand what the way forward is and make recommendations? The idea would be to find a way to not have to make breaking changes or change your documents. The solution we came up with is hopefully flexible enough to give the option to either have the whole document without subobjects, or only a specific section of it. |
An example document: {
"host": {
"name": "myhost"
},
"foo": 42,
"foo.bar": 42,
"baz": 21,
"baz.qux": 21
} |
@javanna important to note in the above:
|
Ok with this document there is no doubt that you'll have to set Though there is a small issue with the current implementation: |
Is that already possible?
Is it an issue if the source document contains nested objects? Could ES convert that into dotted fields during indexing? If not, I suppose APM Server could modify the fields to convert nesting into dotting. It doesn't sound great but probably feasible. But whatever we do, we should make sure to closely align with TSDB. Not sure if there's a catch when mapping metric fields and/or dimension fields as |
👍
AFAIK we don't have any specific need for it to be an object, but this is not entirely within our control; Fleet creates the index & component templates. I'll get in touch with the Fleet folks to see if we can flatten the mappings. It seems like more changes are needed in Elasticsearch though, is that right? This fails:
|
Using non-object mappings for ECS fields, such as There's an effort around making ECS mappings more native in Elasticsearch: elastic/elasticsearch#85692 We'd need to consider two versions of the mapping then - an object-based and a flat one. |
Ok @felixbarny let me think about that, we can try and handle that in ES. @axw for the error above, I am looking into it. |
@axw The error you got does not look like it's returned from the above create index call, but rather from indexing a doc with a host object, or trying to map host as an object, which is the very same problem I mentioned above and I am discussion with Felix too. Can you confirm I got it right? |
@javanna nope, I ran that exact |
@axw I found that problem and opened a PR (elastic/elasticsearch#86894), thanks for spotting it. @felixbarny I wanted to get back to what you said above "Using non-object mappings for ECS fields, such as host.name doesn't sound so great.". We need to get on the same page here: one thing is whether documents contain objects or not, for instance we could relax the current restriction and allow you to still provide host: {name: ""} in an object shape in your documents. This requires extra work on our end but it's doable. Another thing is allowing for mappings to hold objects when subobjects are disabled: this is not possible. If your root object is configured to not hold subobjects, host cannot be an object in your mappings, and in that case specifying host.name as a keyword field in your mapping will be treated differently, meaning host will not be made an object automatically. I hope, but we need to verify this, that you don't specify host as an object in your mappings. I hope this clarifies things, but just in case, I would like to see your mappings and real sample documents. It would be great for us also to base our tests on your real usecase rather than dummy documents. Please ping me and let's chat on a zoom if things are unclear. |
@javanna and I had a chat on zoom. A container object for metrics (all metrics are nested under a Currently, the mapping templates we generate are nested. For example: {
"mappings": {
"properties": {
"host": {
"properties": {
"name": {
"type": "keyword",
"ignore_above": 1024
}
}
}
}
}
} If we flatten the mappings like this: {
"mappings": {
"dynamic": "true",
"properties": {
"host.name": {
"type": "keyword",
"ignore_above": 1024
}
}
}
} They can work both for the case where Another thing to watch out for when mapping the root with
Into
I agreed with @javanna that for now (8.3), Elasticsearch should not automatically do the flattening. If it's not feasible to do it in APM Server, we can talk about adding a flattening feature to Elasticsearch. Luca also explained to me that the very same limitations for dotted field names apply to TSDB. |
I think the |
I don't mean to interfere here and I think that you folks need to have an internal discussion to align on the matter, but based on my preliminary chat with Felix if felt like we are trying not to make breaking changes, and we should be able to achieve that. Please ping me if you discuss this, I am glad to be involved. |
@axw heads up: the bug you hit at your first try is now fixed in master. Thanks for your patience :) |
@OrangeDog we only replace dots when we don't know the metric names ahead of time, e.g. for custom application metrics. For builtin metrics (e.g. https://www.elastic.co/guide/en/apm/agent/java/current/metrics.html), we are not de-dotting. So for example, the builtin metric I do think having a prefix would be neater and simpler, but we would like to avoid breaking users for the moment.
If we need to flatten all metadata fields at ingest time, then we will need to perform some additional testing around the |
@ruflin told me that he also had a chat with @javanna. He said that for Beats the automatic flattening would be important so that older Beats can write to newer Elasticsearch versions. We were thinking that this could be done with a |
Quick update on my end: 8.3 will include support for the subobjects parameter but it will require to adapt documents, as they can't hold any object field when subobjects are disabled (except for non-object fields that can parse objects natively like geo-point). Based on the conversations I had, and more thinking I did around the need to adapt documents, I think that the next step should be for Elasticsearch to automatically handle documents containing objects despite subobjects are disabled. This will help solutions as well as users in general adopt the new provided solution for storing metrics. While adapting mappings is required, the need to adapt documents is something we should avoid. While an ingest processor may help, I am not comfortable with effectively requiring to use an ingest processor to overcome this limitation. Also, it would not be straight-forward to come up with a complete solution in an ingest processor as some objects (think geo_points or other "leaf" fields that are able to parse objects) do need to be preserved, while some others need to be flattened. Ingest does not have knowledge of mappings which makes this not possible (unless a list of fields not to be flattened is provided as a configuration parameter). For these reasons I think we should work on a built-in solution, which will end up affecting how objects are dynamically mapped by Elasticsearch. I will follow-up on this. |
By the way, I just bumped in to an sdh around APM hitting the total fields limit (2000), and it turns out that 300+ of those fields are object fields, which would go away from the mappings with the subobjects: false solution, which feels good given that those object fields don't have much use. This is assuming that exists query are not performed against objects (a controversial feature that we would love not to have to support), and that those objects are in the mappings only to hold properties and not to declare their dynamic behaviour or even disable their parsing. |
@javanna what's the state of this? Is there an issue that you can link? Once we have that in, it seems to be safe to set Edit: I've created elastic/elasticsearch#88934 |
I wonder if there still another unexplored option other than setting e.g do I read What if we can switch to storing metrics under a dedicated field
any metric stored under Conflicts could be handled at ingest time e.g
Older producers still writing New producers writing to That way we could move our metric mappings over to the |
The structure of |
Thanks for creating the issue @felixbarny . I was on holiday when you asked for status, and I am resuming now the discussion around this. I investigated what changes are needed to support flattening objects and some bits require more discussion with the team because they affect quite a bit dynamic mappings of objects as well as field types that support parsing objects like geo_points. Stay tuned, I will keep you posted. |
Describe the bug
As of version 1.18.0, the java agent added instrumentation of micrometer framework, which allows to capture metrics managed with this framework. While the reported issue is specific to
HikariCP
connection pool, it might happen with other frameworks, or could be user-provided.Initial investigation of the issue shows that some metric names are problematic and conflict with metrics storage.
This issue has been opened in the
apm
repository as the actual change required might spread over multiple repositories and the solution to fix it properly isn't defined yet.This was reported in our public forum here : https://discuss.elastic.co/t/failed-to-transform-sample-model-sample/249962
Currently, you can't have two metrics where one is the prefix of another with
.
as separator,connections=3
andconnections.idle=2
will conflict and at least one of them will be ignored as one expectsconnections
to be a metric value, and the other expects it to be an object with anidle
property.To Reproduce
Follow instructions described in the forum to reproduce it with the Java agent.
Any agent that will sent two metrics with names that conflict with each other will have a similar issue.
Expected behavior
Metric names should not have any impact on our ability to capture them.
Debug logs
Server debug logs (provided in the forum)
Agent logs with log_level=trace
Relevant part where the issue is:
Full log
Possible approaches to solve this
While we can apply workarounds that are specific to
HikariCP
or even to our instrumentation of Micrometer, it seems that a proper fix for this needs to be applied at either agent or apm-server level.Here is a non-definitive list of ideas that we (java agent team) have come so far:
.
with_
. While simple to change, that would make such metrics different from others that use dots and break compatibility with existing metrics and even impact APM UI as current visualizations rely on metric names. Another downside is with custom metrics, those will have different (but close) names which could be confusing._value
field automatically when required. This would work as long as those metrics are sent together in the same metricset, which is quite likely. In the example above all metrics related to a database connection pool will always reported together.hikaricp
is the default connection pool forspring-boot
applications, which itself is a very popular framework, that's not a very appealing option for Java agent.Currently, we think that the option 2. is the best for the long term, and having a few metrics with an extra
_value
attribute should be straight-forward for the end-user.What are the other options that we could implement for this ?
Current workaround
micrometer
todisable_instrumentations
, this will completely disable micrometer integration.disabled_metrics=hikaricp.connections
The text was updated successfully, but these errors were encountered: