Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for dots in field names for metrics usecases #86166

Merged
merged 33 commits into from
May 17, 2022

Conversation

javanna
Copy link
Member

@javanna javanna commented Apr 25, 2022

This PR adds support for a new mapping parameter to the configuration of the object mapper (root as well as individual fields), that makes it possible to store metrics data where it's common to have fields with dots in their names in the following format:

{
  "metrics.time" : 10,
  "metrics.time.min" : 1,
  "metrics.time.max" : 500
}

Instead of expanding dotted paths the their corresponding object structure, objects can be configured to preserve dots in field names, in which case they can only hold leaf sub-fields and no further objects.

The mapping parameter is called subobjects and controls whether an object can hold other objects (defaults to true) or not. The following example shows how it can be configured in the mappings:

{
  "mappings" : {
    "properties" : {
      "metrics" : {
        "type" : "object", 
        "subobjects" : false
      }
    }
  }
}

Closes #63530

@javanna javanna removed the v8.3.0 label Apr 26, 2022
@javanna javanna added :Search Foundations/Mapping Index mappings, including merging and defining field types >feature v8.3.0 labels Apr 28, 2022
@elasticsearchmachine
Copy link
Collaborator

Hi @javanna, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Hi @javanna, I've updated the changelog YAML for you.

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I left a couple of nits and suggestions. I think it's probably worth adding the tests for objects as arrays as well, just because that's an area where we are pretty under-tested and I can see it would be very easy for bugs to slip in there.

parseObjectOrField(context, dynamicObjectMapper);
context.path().setCollapsed(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can merge setCollapsed(false) with remove(), as once you've gone back up a level you are by definition no longer on a collapsed object (if it was collapsed you couldn't have added a new level to it!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use add also for multi-fields, so I think resetting collapsed at remove may actually cause problems.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That though reminds me: can you please triple check that I am calling setCollapsed in all the right places? I am afraid I am missing some...

docs/reference/mapping/params/collapsed.asciidoc Outdated Show resolved Hide resolved
}
}
""")));
assertEquals("Object [_doc] is collapsed and does not support inner object [metrics]", err.getMessage());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can detect this case and replace Object [_doc] with root, or something else? It's a bit confusing otherwise, as there isn't a _doc anywhere in the mappings that the user sees.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to get the full object path here as well, but from a quick look at the parsing code it doesn't look like it will be simple to do, so happy to leave both of these as a TODO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea I agree about the error message, I believe that's what we already do for enabled though. I would look at changing this as a follow-up.

Which full object path do you mean?

@elasticsearchmachine
Copy link
Collaborator

Hi @javanna, I've updated the changelog YAML for you. Note that since this PR is labelled release highlight, you need to update the changelog YAML to fill out the extended information sections.

@javanna javanna marked this pull request as ready for review April 28, 2022 11:43
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Apr 28, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for bringing this one over the line @javanna!

@javanna javanna merged commit d45b19d into elastic:master May 17, 2022
javanna added a commit to javanna/elasticsearch that referenced this pull request May 18, 2022
As part of elastic#86166 we added support for a new mapping parameter called `subobjects` that
can be set to the `object` field type. Such parameter will affect not only how incoming
documents will be dinamically mapped in the future, but also how field names are treated
as part of the mappings themselves.

Mappings get parsed into a map, where keys ordering is not guaranteed. In case a mappings call
contains `subobjects` set to `false` and also sub-fields for that same object, we need to make
sure that we read the parameter in advance in order to know how to treat the sub-field and decide
whether we need to expand dots in field names or leave them as they are.
javanna added a commit to javanna/elasticsearch that referenced this pull request May 19, 2022
With elastic#86166 we have added support for leaf fields that have dots in their names, without needing to expand their path to intermediate objects. That means that for instance a host.name keyword field mapper can be held directly by the root object mapper if the root has subobjects set to false.

This opens up for situations where a field name containing dots can't necessarily be split into a leaf name and a parent path made of intermediate object mappers. In the field mapper merge code, we build the full path of the merged field making that now incorrect assumption, which causes the result of merged leaf fields to have incorrect full path. This is a bug although it got unnoticed so far as the full path of a field mapper is not so widely used compared to its leaf name (returned by the simpleName method). One place where the full path is used is when sorting the child mappers in ObjectMapper#serializeMappers which was causing MapperService#assertRefreshIsNotNeeded to trip as the result of the merge has fields in a different order compared to the incoming mappings.

With this fix, we add a MapperBuilderContext argument to the different merge methods, propagate the MapperBuilderContext in all the merge calls and create a child context when needed, meaning whenever merging object mappers, so that their children mappers are created with the proper full path.
javanna added a commit that referenced this pull request May 20, 2022
With #86166 we have added support for leaf fields that have dots in their names, without needing to expand their path to intermediate objects. That means that for instance a host.name keyword field mapper can be held directly by the root object mapper if the root has subobjects set to false.

This opens up for situations where a field name containing dots can't necessarily be split into a leaf name and a parent path made of intermediate object mappers. In the field mapper merge code, we build the full path of the merged field making that now incorrect assumption, which causes the result of merged leaf fields to have incorrect full path. This is a bug although it got unnoticed so far as the full path of a field mapper is not so widely used compared to its leaf name (returned by the simpleName method). One place where the full path is used is when sorting the child mappers in ObjectMapper#serializeMappers which was causing MapperService#assertRefreshIsNotNeeded to trip as the result of the merge has fields in a different order compared to the incoming mappings.

With this fix, we add a MapperBuilderContext argument to the different merge methods, propagate the MapperBuilderContext in all the merge calls and create a child context when needed, meaning whenever merging object mappers, so that their children mappers are created with the proper full path.
@nik9000 nik9000 mentioned this pull request May 20, 2022
50 tasks
javanna added a commit that referenced this pull request May 23, 2022
As part of #86166 we added support for a new mapping parameter called `subobjects` that
can be set to the `object` field type. Such parameter will affect not only how incoming
documents will be dynamically mapped in the future, but also how field names are treated
as part of the mappings themselves.

Mappings get parsed into a map, where keys ordering is not guaranteed. In case a mappings call
contains `subobjects` set to `false` and also sub-fields for that same object, we need to make
sure that we read the parameter in advance in order to know how to treat the sub-field and decide
whether we need to expand dots in field names or leave them as they are.
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request May 31, 2022
This adds some paranoid tests for synthetic source with disabling
subobjects, as added by elastic#86166. It turns out that synthetic source does
exactly what you'd expect with disabling subobjects - it creates fields
with dots in their names. This adds tests for that.
nik9000 added a commit that referenced this pull request Jun 1, 2022
This adds some paranoid tests for synthetic source with disabling
subobjects, as added by #86166. It turns out that synthetic source does
exactly what you'd expect with disabling subobjects - it creates fields
with dots in their names. This adds tests for that.
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Jun 1, 2022
This adds some paranoid tests for synthetic source with disabling
subobjects, as added by elastic#86166. It turns out that synthetic source does
exactly what you'd expect with disabling subobjects - it creates fields
with dots in their names. This adds tests for that.
elasticsearchmachine pushed a commit that referenced this pull request Jun 1, 2022
This adds some paranoid tests for synthetic source with disabling
subobjects, as added by #86166. It turns out that synthetic source does
exactly what you'd expect with disabling subobjects - it creates fields
with dots in their names. This adds tests for that.
@ruflin
Copy link
Contributor

ruflin commented Jun 2, 2022

I filed elastic/package-spec#349 to get support for this in package-spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature release highlight :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Clients Meta label for clients team Team:Search Meta label for search team v8.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dotted field names that conflict with objects
6 participants