Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent accidental field type property overwrites #51436

Closed

Conversation

cbuescher
Copy link
Member

@cbuescher cbuescher commented Jan 24, 2020

When putting mapping definitions there currently appears to be loopholes by
which e.g. the mapping parameters of a in inner field "f1" inside an object "obj"
can be overwritten in the same mapping definition by specifying a field
"obj.f1". FieldMapper.doMerge() currently only checks content type compatibility
but e.g. overwrites its own fieldType with the one merged in. This can lead to
suprising behaviour where users think the "merged" field properties (see e.g.
ones merged in.
This PR suggests trying to detect at least these kinds of incompatibilities and
preventing merges like this to happen by leveraging the existing
checkCompatibility mechanism on the fieldtype.

Relates to #51286

When putting mapping definitions there currently appears to be loopholes by
which e.g. the mapping parameters of a in inner field "f1" inside an object "obj"
can be overwritten in the same mapping definition by specifying a field
"obj.f1". FieldMapper.doMerge() currently only checks content type compatibility
but e.g. overwrites its own fieldType with the one merged in. This can lead to
suprising behaviour where users think the "merged" field properties (see e.g.
ones merged in.
This PR suggests trying to detect at least these kinds of incompatibilities and
preventing merges like this to happen by leveraging the existing
`checkCompatibility` mechanism on the fieldtype.

Relates to elastic#51286
@cbuescher cbuescher added :Search Foundations/Mapping Index mappings, including merging and defining field types v7.7.0 labels Jan 24, 2020
@cbuescher cbuescher requested a review from jtibshirani January 24, 2020 17:18
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Mapping)

@cbuescher
Copy link
Member Author

@jtibshirani asking for your initial opinion here mostly, do you think we should prevent scenarios like in #51436 (simplified in the test) to happen? Not sure if this is the best location to reject these kind of merge conflicts, you might be able to pinpoint better locations or strategies.

@jtibshirani
Copy link
Contributor

jtibshirani commented Jan 27, 2020

Hello @cbuescher, I’ll first describe my understanding of what’s happening to check we’re on the same page:

  1. A single mapping definition can include the same field twice if the field is defined both through nested object notation, and also through dot notation.
  2. When parsing the mapping, we create an ObjectMapper. This object actually merges duplicate definitions together when it’s built through FieldMapper#doMerge: https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/index/mapper/ObjectMapper.java#L139-L146
  3. We only do very light validation in FieldMapper#doMerge, so we don’t catch the fact that there’s a conflict, and one field definition basically ‘wins out’.

If instead the definition is spread out over two ‘put mapping’ calls (add the field mapping, then later update it with a conflicting mapping), then we correctly catch the conflict. This is because we end up validating the new definition against the old one through MappedFieldType#checkCompatibility.

This PR addresses step 3, but to me the core of the bug is in step 2 -- we silently allow the same field to be defined twice in a single mapping definition. We have an open issue #37437 about this issue. It would be great to discuss what we think the best behavior is -- my intuition is that we should disallow it because it probably indicates an accidental user error. It’s also not clear which field should be considered first or second (as you mentioned in #51286).

It’s also not so clean that we validate whether field mappings can be merged in two separate places: FieldMapper#doMerge, and MappedFieldType#checkCompatibility. This would be a nice refactor to think about as a separate (but related) issue. A note that it relates to @jpountz’s proposal in #41058 to combine FieldMapper and MappedFieldType.

@cbuescher
Copy link
Member Author

Thanks @jtibshirani, your summary capture my findings around digging into #51286. I wasn't aware of the open issue around "Disallow defining the same field", maybe thats a better approach indeed. I opened this PR mostly to illustrate the problem and offer some solution to the validation step you mentioned in step 3, happy to wait out the discussion on the other issue. Should we keep this open for the time being or close and reopen if needed?

@jtibshirani
Copy link
Contributor

My vote would be to close this out for now and focus on the other issue #37437. We could revisit this PR as we think through the refactors in #41058.

@cbuescher
Copy link
Member Author

Sounds good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Search Foundations/Mapping Index mappings, including merging and defining field types v7.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants