Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge mappings for composable index templates #58521

Merged
merged 11 commits into from
Jun 29, 2020

Conversation

jtibshirani
Copy link
Contributor

@jtibshirani jtibshirani commented Jun 25, 2020

This PR implements recursive mapping merging for composable index templates.

When creating an index, we perform the following:

  • Add each component template mapping in order, merging each one in after the
    last.
  • Merge in the index template mappings (if present).
  • Merge in the mappings on the index request itself (if present).

Some principles:

  • All 'structural' changes are disallowed (but everything else is fine). An
    object mapper can never be changed between type: object and type: nested. A
    field mapper can never be changed to an object mapper, and vice versa.
  • Generally, each section is merged recursively. This includes object
    mappings, as well as root options like dynamic_templates and meta. Once we
    reach 'leaf components' like field definitions, they always overwrite an
    existing one instead of being merged.

The PR is pretty large, but each commit can be reviewed individually.

Relates to #53101.

@jtibshirani jtibshirani force-pushed the merge-template-mapping branch from c30d286 to 6f5e142 Compare June 25, 2020 01:20
The differences from standard mapping merge:
* For metadata fields like _source, the whole field definition is replaced
  instead of merged.
* Dynamic templates are appended to the end of the existing list. Any duplicate
  definitions overwrite existing ones.
* The _meta object is deep-merged, so any existing keys are replaced.
@jtibshirani jtibshirani force-pushed the merge-template-mapping branch from 6f5e142 to 5a37903 Compare June 25, 2020 03:05
@jtibshirani jtibshirani added :Data Management/Indices APIs APIs to create and manage indices and templates :Search Foundations/Mapping Index mappings, including merging and defining field types >feature v7.9.0 v8.0.0 labels Jun 25, 2020
@jtibshirani jtibshirani marked this pull request as ready for review June 25, 2020 17:34
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/Indices APIs)

@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Jun 25, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Mapping)

@elasticmachine elasticmachine added the Team:Search Meta label for search team label Jun 25, 2020
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this Julie! I left a couple of comments with (small!) substance and then a bunch of minor nits. I am not an expert in our mapping code so it would be good to have a second opinion also


if (request.dataStreamName() != null) {
DataStream dataStream = currentState.metadata().dataStreams().get(request.dataStreamName());
if (dataStream != null) {
dataStream.getTimeStampField().insertTimestampFieldMapping(mappings);
mappings.add(dataStream.getTimeStampField().getTimestampFieldMapping());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After chatting with the core-features team, I now realize that we should apply request.mappings() after this datastream mapping. I'll fix this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually @dakrone is that right? If we applied request.mappings() after the datastream timestamp mapping, it could result in an index with an inconsistent timestamp mapping. So perhaps we always need to apply the datastream timestamp mapping last.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the request.mappings() part last, but we need to validate that the composite mappings are valid for a data stream, that way a user could override part of the mappings (say for instance a custom format for the date), but couldn't do anything bad (like change it to a keyword for instance)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to keep it as-is for this PR, since it just tries to replicate the existing behavior. We can discuss in a follow-up if we want to change the order.

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes looks great. I like how it is integrated into the mapping merge, it makes the logic very easy to follow. I left one question and it would be nice to have a single place where the full merging logic is described.

}
} else if (isEnabled() != mergeWith.isEnabled()) {
throw new MapperException("the [enabled] parameter can't be updated for the object mapping [" + name() + "]");
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should nullify the underlying mappers if enabled is false ? It's not related to this pr specifically but we don't need to keep children when the parent is disabled ?

Copy link
Contributor Author

@jtibshirani jtibshirani Jun 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you mentioned, I think this is general question. Even without this PR you can define an object mapper like this:

"object": {
  "type": "object",
  "enabled": false,
  "properties": {
    "field": {
      "type": "keyword"
    }
  }
}

For this PR I plan to stick with the existing behavior, but we could have a separate discussion on whether we want to remove this flexibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I opened #58702 to discuss this specifically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great thanks!

@jtibshirani jtibshirani requested a review from dakrone June 29, 2020 18:57
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Julie!

@jtibshirani
Copy link
Contributor Author

@elasticmachine elasticsearch-ci/1

@jtibshirani jtibshirani merged commit 676893a into elastic:master Jun 29, 2020
@jtibshirani jtibshirani deleted the merge-template-mapping branch June 29, 2020 22:00
jtibshirani added a commit to jtibshirani/elasticsearch that referenced this pull request Jun 30, 2020
These were added in elastic#58521 but were incorrect -- at this point the mapping is
not necessarily wrapped in the '_doc' type. The assertion likely didn't trip
before because it's common for mappings to have only a top-level 'properties'
key.
jtibshirani added a commit that referenced this pull request Jun 30, 2020
These were added in #58521 but were incorrect -- at this point the mapping is
not necessarily wrapped in the '_doc' type. The assertion likely didn't trip
before because it's common for mappings to have only a top-level 'properties'
key.
dakrone added a commit to dakrone/elasticsearch that referenced this pull request Jul 1, 2020
This commit adds an integration test that component templates used to form a composite template can
still be updated after a cluster restart.

In elastic#58643 an issue arose where mappings were causing problems because of the way we unwrap `_doc` in
template mappings. This was also related to the mappings being merged manually rather than using the
`MapperService` to do the merging. elastic#58643 was fixed in 7.9 and master with the elastic#58521 change, since
mappings now are read and digested by the actual mapper service.

This test passes for 7.x and master, and I intend to open a separate PR including this test for
7.8.1 along with a bug fix for elastic#58643. This test is to ensure we don't have any regression in the
future.
dakrone added a commit that referenced this pull request Jul 2, 2020
This commit adds an integration test that component templates used to form a composite template can
still be updated after a cluster restart.

In #58643 an issue arose where mappings were causing problems because of the way we unwrap `_doc` in
template mappings. This was also related to the mappings being merged manually rather than using the
`MapperService` to do the merging. #58643 was fixed in 7.9 and master with the #58521 change, since
mappings now are read and digested by the actual mapper service.

This test passes for 7.x and master, and I intend to open a separate PR including this test for
7.8.1 along with a bug fix for #58643. This test is to ensure we don't have any regression in the
future.
dakrone added a commit to dakrone/elasticsearch that referenced this pull request Jul 2, 2020
…ic#58883)

This commit adds an integration test that component templates used to form a composite template can
still be updated after a cluster restart.

In elastic#58643 an issue arose where mappings were causing problems because of the way we unwrap `_doc` in
template mappings. This was also related to the mappings being merged manually rather than using the
`MapperService` to do the merging. elastic#58643 was fixed in 7.9 and master with the elastic#58521 change, since
mappings now are read and digested by the actual mapper service.

This test passes for 7.x and master, and I intend to open a separate PR including this test for
7.8.1 along with a bug fix for elastic#58643. This test is to ensure we don't have any regression in the
future.
dakrone added a commit that referenced this pull request Jul 2, 2020
…58883) (#58914)

This commit adds an integration test that component templates used to form a composite template can
still be updated after a cluster restart.

In #58643 an issue arose where mappings were causing problems because of the way we unwrap `_doc` in
template mappings. This was also related to the mappings being merged manually rather than using the
`MapperService` to do the merging. #58643 was fixed in 7.9 and master with the #58521 change, since
mappings now are read and digested by the actual mapper service.

This test passes for 7.x and master, and I intend to open a separate PR including this test for
7.8.1 along with a bug fix for #58643. This test is to ensure we don't have any regression in the
future.
dakrone added a commit to dakrone/elasticsearch that referenced this pull request Jul 2, 2020
Due to the way that our `toXContent` of mappings works, we reduce the type to remove `_doc` from
template mappings when serializing them to disk as part of cluster state. This causes problems
however, as we always expected there to be a `_doc` field in the mappings. This was resolved in
later versions of ES (7.9+) by elastic#58521 which uses a real mapper to handle the mappings. For 7.8.1,
however, we need to ensure that we can still read the state after it has been persisted to disk.

Resolves elastic#58643
Relates to elastic#58883
Relates to elastic#58521
dakrone added a commit that referenced this pull request Jul 6, 2020
Due to the way that our `toXContent` of mappings works, we reduce the type to remove `_doc` from
template mappings when serializing them to disk as part of cluster state. This causes problems
however, as we always expected there to be a `_doc` field in the mappings. This was resolved in
later versions of ES (7.9+) by #58521 which uses a real mapper to handle the mappings. For 7.8.1,
however, we need to ensure that we can still read the state after it has been persisted to disk.

Resolves #58643
Relates to #58883
Relates to #58521
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Indices APIs APIs to create and manage indices and templates >feature :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Data Management Meta label for data/management team Team:Search Meta label for search team v7.9.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants