Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source Filtering does not Expose Field Names with Dots #20719

Closed
jbaiera opened this issue Oct 3, 2016 · 2 comments
Closed

Source Filtering does not Expose Field Names with Dots #20719

jbaiera opened this issue Oct 3, 2016 · 2 comments
Assignees

Comments

@jbaiera
Copy link
Member

jbaiera commented Oct 3, 2016

Elasticsearch version: 5.0.0-beta1

Plugins installed: []

JVM version: 1.8.0_91

Description of the problem including expected versus actual behavior:
In certain cases of using source filtering on field names that contain dots, results that are inconsistent can be returned. My understanding of the source filtering is that it is strictly a filter that is applied over the source document as is.

Steps to reproduce:

  1. Insert a document with a dotted field name and a field that is a regular object:
    • curl -XPUT localhost:9200/dots/test/1 -d '{"a.b":0,"c":1,"d":{"e":2}}'
  2. Execute a match all query and request each field explicitly from the source filter:
    • curl -XPOST 'localhost:9200/dots/test/_search?_source=a,c,d'
  3. Notice that the response does not include the "a.b" field:
    • "hits" : [ { "_index" : "dots", "_type" : "test", "_id" : "1", "_score" : 1.0, "_source" : { "c" : 1, "d" : { "e" : 2 } } } ]
  4. Execute a match all query and request the a field using a wild card:
    • curl -XPOST 'localhost:9200/dots/test/_search?_source=a.*,c,d'
  5. Notice that the response includes the "a.b" field now:
    1. "hits" : [ { "_index" : "dots", "_type" : "test", "_id" : "1", "_score" : 1.0, "_source" : { "c" : 1, "a.b" : 0, "d" : { "e" : 2 } } } ]

Worth Mentioning:

  1. If you insert another document into the index that contains a.b as a json object, the fields from that document come back in source filtered queries:
    1. curl -XPUT localhost:9200/dots/test/2 -d '{"a":{"b":5}}'
  2. curl -XPOST 'localhost:9200/dots/test/_search?_source=a,c,d&pretty'
    1. "hits" : [ { "_index" : "dots", "_type" : "test", "_id" : "1", "_score" : 1.0, "_source" : { "c" : 1, "d" : { "e" : 2 } } }, { "_index" : "dots", "_type" : "test", "_id" : "2", "_score" : 1.0, "_source" : { "a" : { "b" : 5 } } } ]

Describe the feature:
My understanding is that the source filter works only on the raw source field. In the above case the field a does not exist in the original document, only the field a.b. Since "a.b" != "a" the field is ignored. The discussion that I want to whip up about this is whether or not the source filter logic should be consistent with the rest of Elasticsearch in how it handles parsing of dotted field names before checking if they match a source filter.

Edit: Removed a bogus reproduction step.

@jbaiera
Copy link
Member Author

jbaiera commented Oct 3, 2016

This is related to elastic/elasticsearch-hadoop#854. I spoke with @rjernst about the situation and he recommended that we switch to using the fields (or now called stored_fields) parameter instead for performance reasons, but the fields parameter only functions on fields that are leaves of the document, unless a wildcard name is given.

@jpountz
Copy link
Contributor

jpountz commented Oct 3, 2016

The discussion that I want to whip up about this is whether or not the source filter logic should be consistent with the rest of Elasticsearch in how it handles parsing of dotted field names before checking if they match a source filter.

+1 for consistency

@jpountz jpountz self-assigned this Oct 3, 2016
jpountz added a commit to jpountz/elasticsearch that referenced this issue Oct 4, 2016
Mappings treat dots in field names as sub objects, for instance

```
{
  "a.b": "c"
}
```

generates the same dynamic mappings as

```
{
  "a": {
    "b": "c"
  }
}
```

Source filtering should be consistent with this behaviour so that an include
list containing `a` should include fields whose name is `a.b`.

To make this change easier, source filtering was refactored to use automata.
The ability to treat dots in field names as sub objects is provided by the
`makeMatchDotsInFieldNames` method of `XContentMapValues`.

Closes elastic#20719
jpountz added a commit that referenced this issue Oct 10, 2016
…0736)

Mappings treat dots in field names as sub objects, for instance

```
{
  "a.b": "c"
}
```

generates the same dynamic mappings as

```
{
  "a": {
    "b": "c"
  }
}
```

Source filtering should be consistent with this behaviour so that an include
list containing `a` should include fields whose name is `a.b`.

To make this change easier, source filtering was refactored to use automata.
The ability to treat dots in field names as sub objects is provided by the
`makeMatchDotsInFieldNames` method of `XContentMapValues`.

Closes #20719
jpountz added a commit that referenced this issue Oct 10, 2016
…0736)

Mappings treat dots in field names as sub objects, for instance

```
{
  "a.b": "c"
}
```

generates the same dynamic mappings as

```
{
  "a": {
    "b": "c"
  }
}
```

Source filtering should be consistent with this behaviour so that an include
list containing `a` should include fields whose name is `a.b`.

To make this change easier, source filtering was refactored to use automata.
The ability to treat dots in field names as sub objects is provided by the
`makeMatchDotsInFieldNames` method of `XContentMapValues`.

Closes #20719
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants