Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow defining the same field twice in a single mapping call. #37437

Open
jtibshirani opened this issue Jan 14, 2019 · 17 comments
Open

Disallow defining the same field twice in a single mapping call. #37437

jtibshirani opened this issue Jan 14, 2019 · 17 comments
Labels
>bug priority:normal A label for assessing bug priority to be used by ES engineers :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@jtibshirani
Copy link
Contributor

In #37141 we discovered that it is possible to define the same field twice in a single request to add mappings, so long as the field mapping definitions are compatible and can be merged together. For example, the following mapping is valid, and the first definition happens to 'win out':

PUT index
{
  "mappings": {
    "_doc": {
      "properties": {
          "nested": {
              "properties": {
                  "field": {
                      "type":"keyword"
                }
              }
          },
          "nested.field": {
              "type":"keyword",
              "boost": 3.0
          }
      }
    }
  }
}

To me, a request like this is not likely to be intentional, and could indicate a bug in the calling code. For this reason it may make sense to detect this case and disallow it.

As a note, the above mapping update would be rejected if the duplicate field were specified at the same level. For example, the following mapping fails with a JSON parsing exception:

PUT index
{
  "mappings": {
    "_doc": {
      "properties": {
          "nested.field": {
              "type":"keyword"
          },
          "nested.field": {
              "type":"keyword"
          }
      }
    }
  }
}
@jtibshirani jtibshirani added >bug :Search Foundations/Mapping Index mappings, including merging and defining field types labels Jan 14, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@jtibshirani jtibshirani changed the title Disallow defining the same field twice in a single mapping definition. Disallow defining the same field twice in a single mapping call. Jan 14, 2019
@jtibshirani jtibshirani added the help wanted adoptme label May 7, 2019
@zacharymorn
Copy link
Contributor

This issue has been open for more than a year. Does it still need fixing?

@jtibshirani
Copy link
Contributor Author

@zacharymorn yes, I still think it'd be valuable to fix. Let me know if you're interested in looking into it. As a heads up, there are some details that still need settling like whether we need a deprecation period where we warn about the duplicate entries before we outright disallow them.

@zacharymorn
Copy link
Contributor

@jtibshirani I've created a draft PR for this. For the time being I chose to emit deprecation warning when the issue in question is detected. Please let me know if this looks good to you.

@jtibshirani
Copy link
Contributor Author

Thanks @zacharymorn for the contribution. I'll look over the PR. I've also marked this issue for discussion so we can double-check the deprecation strategy.

@jpountz
Copy link
Contributor

jpountz commented Apr 6, 2020

I thought we were already checking this here:

} else if (fieldNames.add(name) == false) {
throw new IllegalArgumentException("Field [" + name + "] is defined twice.");
. But obviously we're not, I wonder why this validation logic is not working.

@jtibshirani
Copy link
Contributor Author

jtibshirani commented Apr 6, 2020

I looked into this a bit more -- the issue is that when building the root ObjectMapper, we attempt to merge together fields with the same name: https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/index/mapper/ObjectMapper.java#L143. So if there is no field mapper conflict as in the original example, the merge succeeds and the object mapper will contain a single definition for the field. By the time we call into MapperMergeValidator.validateNewMappers, there is only one field mapper.

To me this feels like confusing enough behavior, and also uncommon enough that we could treat it as a bug instead of a behavior change. We've seen it come up before in a user's mappings (#51286), but it actively caused issues for that user. Another reason I'm hesitant to treat it as a normal behavior change is that in addition to a deprecation period, we'd want to consider API versioning, which seems too heavy-handed.

Maybe a compromise would be to only make the fix in 8.0, and not backport it? For major upgrades, many users already give a close look at their mappings and are ready to make changes.

@jpountz
Copy link
Contributor

jpountz commented Apr 6, 2020

I agree with your general thinking, but would like to better understand what we'd be breaking by changing this, e.g. I wonder whether this behavior is relied upon to make dynamic mappings work with a document that would look like this:

{
   "foo" : {
     "bar": 1
  },
  "foo.bar": 2
}

@jtibshirani
Copy link
Contributor Author

I did some testing and found that if we changed this behavior, we could keep these related behaviors the same:

  • A dynamic mapping update would still work given a document like above that defines the same field twice. This is because we merge dynamic mapping updates together in DocumentParser#createDynamicUpdate before constructing the combined update.
  • As expected, you could still define a field using structured JSON notation, and then later send a mapping update using dot notation, and it would succeed.

A note that at a high level, we rely on this merging behavior for a mapping like the following to work:

PUT index
{
  "mappings": {
    "properties": {
      "nested.field": {
        "type": "keyword"
      },
      "nested.other_field": {
        "type": "keyword"
      }
    }
  }
}

The change should make sure to preserve the ability to define mappings like this. For this reason, I think the fix will be a bit more complex.

@jpountz
Copy link
Contributor

jpountz commented Apr 7, 2020

Thanks for digging. I agree we should preserve the ability to define mappings like this.

@zacharymorn
Copy link
Contributor

zacharymorn commented Apr 8, 2020

I'm trying to follow the discussion but feel I might have some context/knowledge gap.

From my quick reading of dynamic mapping, it seems to be trigged by index document update. But I was previously under the impression that the fix to this issue is to detect the nested field definition duplication for explicit index creation or mapping updating. So they seems to be on different code path.

On the other hand, I’m also not clear that if we disallow nested field definition for explicit mapping update, but keep that for dynamic mapping, wouldn’t that create inconsistencies in field definitions from different methods?

Am I missing something? Is my current understanding totally on the wrong direction :D ?

@jtibshirani
Copy link
Contributor Author

@zacharymorn sorry for the late response. Your overall understanding is correct, I'll just add some clarifications. This issue is indeed focused on duplicate field definitions in index creation/ mapping updates. As you say, dynamic mappings are triggered by document updates. Dynamic mappings do have their own code path, but they share quite a bit of logic with normal mapping updates. So @jpountz's question was whether changing this behavior for normal mapping updates could also affect dynamic updates.

wouldn’t that create inconsistencies in field definitions from different methods?

This is a good question, and one I had after reading the dynamic update example. I think it could also use discussion internally (in addition to the deprecation question). I'll give some preliminary comments on your PR as we work to sort out these open questions.

@jtibshirani
Copy link
Contributor Author

jtibshirani commented Apr 21, 2020

We met to discuss and have decided the following:

  • We indeed want to reject mappings with duplicate keys. For now, we will still accept duplicate keys in documents, although this could be seen as a bit inconsistent.
  • We'll issue a deprecation warning about the issue in 7.x, and then in 8.0 we'll reject new mappings with duplicate keys.

Some implementation notes:

  • Typically we always develop against master. So we can first create a PR against master that issues deprecation warnings, then backport it to 7.x. After, we'll follow-up with a change on master (8.0) to remove the deprecation and start throwing an exception.
  • The change should target the mappings validation/ building logic, not JSON parsing itself.
  • We should make sure to preserve the behaviors described in Disallow defining the same field twice in a single mapping call. #37437 (comment).
  • I will catch up with our other teams internally to get ahead of possible issues/ regressions this fix could cause.

Lastly for the team's context, here are the two issues I've seen where users ran into this confusing behavior: #51286, #37141.

@Razi007
Copy link

Razi007 commented Aug 20, 2020

@jtibshirani is this issue still open?

@jtibshirani
Copy link
Contributor Author

@Razi007 there is already an open PR from @zacharymorn (#54481), I just totally dropped the ball on reviewing it. Since there's already pre-existing work, it'd be best to check out some other issues with the help wanted or good first issue label.

@javanna javanna self-assigned this Jun 16, 2022
@javanna javanna removed their assignment Aug 3, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@javanna javanna added the priority:normal A label for assessing bug priority to be used by ES engineers label Jun 6, 2024
@javanna javanna added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 16, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug priority:normal A label for assessing bug priority to be used by ES engineers :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

8 participants