-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for dots in field names for metrics usecases #86166
Add support for dots in field names for metrics usecases #86166
Conversation
Hi @javanna, I've created a changelog YAML for you. |
Hi @javanna, I've updated the changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! I left a couple of nits and suggestions. I think it's probably worth adding the tests for objects as arrays as well, just because that's an area where we are pretty under-tested and I can see it would be very easy for bugs to slip in there.
parseObjectOrField(context, dynamicObjectMapper); | ||
context.path().setCollapsed(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can merge setCollapsed(false)
with remove()
, as once you've gone back up a level you are by definition no longer on a collapsed object (if it was collapsed you couldn't have added a new level to it!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we use add also for multi-fields, so I think resetting collapsed at remove may actually cause problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That though reminds me: can you please triple check that I am calling setCollapsed in all the right places? I am afraid I am missing some...
server/src/main/java/org/elasticsearch/index/mapper/DotExpandingXContentParser.java
Outdated
Show resolved
Hide resolved
} | ||
} | ||
"""))); | ||
assertEquals("Object [_doc] is collapsed and does not support inner object [metrics]", err.getMessage()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can detect this case and replace Object [_doc]
with root
, or something else? It's a bit confusing otherwise, as there isn't a _doc
anywhere in the mappings that the user sees.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to get the full object path here as well, but from a quick look at the parsing code it doesn't look like it will be simple to do, so happy to leave both of these as a TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea I agree about the error message, I believe that's what we already do for enabled though. I would look at changing this as a follow-up.
Which full object path do you mean?
Hi @javanna, I've updated the changelog YAML for you. Note that since this PR is labelled |
Pinging @elastic/es-search (Team:Search) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for bringing this one over the line @javanna!
As part of elastic#86166 we added support for a new mapping parameter called `subobjects` that can be set to the `object` field type. Such parameter will affect not only how incoming documents will be dinamically mapped in the future, but also how field names are treated as part of the mappings themselves. Mappings get parsed into a map, where keys ordering is not guaranteed. In case a mappings call contains `subobjects` set to `false` and also sub-fields for that same object, we need to make sure that we read the parameter in advance in order to know how to treat the sub-field and decide whether we need to expand dots in field names or leave them as they are.
With elastic#86166 we have added support for leaf fields that have dots in their names, without needing to expand their path to intermediate objects. That means that for instance a host.name keyword field mapper can be held directly by the root object mapper if the root has subobjects set to false. This opens up for situations where a field name containing dots can't necessarily be split into a leaf name and a parent path made of intermediate object mappers. In the field mapper merge code, we build the full path of the merged field making that now incorrect assumption, which causes the result of merged leaf fields to have incorrect full path. This is a bug although it got unnoticed so far as the full path of a field mapper is not so widely used compared to its leaf name (returned by the simpleName method). One place where the full path is used is when sorting the child mappers in ObjectMapper#serializeMappers which was causing MapperService#assertRefreshIsNotNeeded to trip as the result of the merge has fields in a different order compared to the incoming mappings. With this fix, we add a MapperBuilderContext argument to the different merge methods, propagate the MapperBuilderContext in all the merge calls and create a child context when needed, meaning whenever merging object mappers, so that their children mappers are created with the proper full path.
With #86166 we have added support for leaf fields that have dots in their names, without needing to expand their path to intermediate objects. That means that for instance a host.name keyword field mapper can be held directly by the root object mapper if the root has subobjects set to false. This opens up for situations where a field name containing dots can't necessarily be split into a leaf name and a parent path made of intermediate object mappers. In the field mapper merge code, we build the full path of the merged field making that now incorrect assumption, which causes the result of merged leaf fields to have incorrect full path. This is a bug although it got unnoticed so far as the full path of a field mapper is not so widely used compared to its leaf name (returned by the simpleName method). One place where the full path is used is when sorting the child mappers in ObjectMapper#serializeMappers which was causing MapperService#assertRefreshIsNotNeeded to trip as the result of the merge has fields in a different order compared to the incoming mappings. With this fix, we add a MapperBuilderContext argument to the different merge methods, propagate the MapperBuilderContext in all the merge calls and create a child context when needed, meaning whenever merging object mappers, so that their children mappers are created with the proper full path.
As part of #86166 we added support for a new mapping parameter called `subobjects` that can be set to the `object` field type. Such parameter will affect not only how incoming documents will be dynamically mapped in the future, but also how field names are treated as part of the mappings themselves. Mappings get parsed into a map, where keys ordering is not guaranteed. In case a mappings call contains `subobjects` set to `false` and also sub-fields for that same object, we need to make sure that we read the parameter in advance in order to know how to treat the sub-field and decide whether we need to expand dots in field names or leave them as they are.
This adds some paranoid tests for synthetic source with disabling subobjects, as added by elastic#86166. It turns out that synthetic source does exactly what you'd expect with disabling subobjects - it creates fields with dots in their names. This adds tests for that.
This adds some paranoid tests for synthetic source with disabling subobjects, as added by #86166. It turns out that synthetic source does exactly what you'd expect with disabling subobjects - it creates fields with dots in their names. This adds tests for that.
This adds some paranoid tests for synthetic source with disabling subobjects, as added by elastic#86166. It turns out that synthetic source does exactly what you'd expect with disabling subobjects - it creates fields with dots in their names. This adds tests for that.
This adds some paranoid tests for synthetic source with disabling subobjects, as added by #86166. It turns out that synthetic source does exactly what you'd expect with disabling subobjects - it creates fields with dots in their names. This adds tests for that.
I filed elastic/package-spec#349 to get support for this in package-spec. |
This PR adds support for a new mapping parameter to the configuration of the object mapper (root as well as individual fields), that makes it possible to store metrics data where it's common to have fields with dots in their names in the following format:
Instead of expanding dotted paths the their corresponding object structure, objects can be configured to preserve dots in field names, in which case they can only hold leaf sub-fields and no further objects.
The mapping parameter is called
subobjects
and controls whether an object can hold other objects (defaults to true) or not. The following example shows how it can be configured in the mappings:Closes #63530