Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source filtering should treat dots in field names as sub objects. #20736

Merged

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Oct 4, 2016

Mappings treat dots in field names as sub objects, for instance

{
  "a.b": "c"
}

generates the same dynamic mappings as

{
  "a": {
    "b": "c"
  }
}

Source filtering should be consistent with this behaviour so that an include
list containing a should include fields whose name is a.b.

To make this change easier, source filtering was refactored to use automata.
The ability to treat dots in field names as sub objects is provided by the
makeMatchDotsInFieldNames method of XContentMapValues.

Closes #20719

Mappings treat dots in field names as sub objects, for instance

```
{
  "a.b": "c"
}
```

generates the same dynamic mappings as

```
{
  "a": {
    "b": "c"
  }
}
```

Source filtering should be consistent with this behaviour so that an include
list containing `a` should include fields whose name is `a.b`.

To make this change easier, source filtering was refactored to use automata.
The ability to treat dots in field names as sub objects is provided by the
`makeMatchDotsInFieldNames` method of `XContentMapValues`.

Closes elastic#20719
@jbaiera
Copy link
Member

jbaiera commented Oct 5, 2016

+1 LGTM, Thanks @jpountz

@jpountz
Copy link
Contributor Author

jpountz commented Oct 5, 2016

@mikemccand would you ming having a quick look at the usage of the Automaton API before it gets merged?

@nik9000
Copy link
Member

nik9000 commented Oct 6, 2016

If Mike doesn't get to it i can look again in the morning. I looked earlier
at it and didn't have any strong thoughts though.

On Oct 5, 2016 6:26 PM, "Adrien Grand" [email protected] wrote:

@mikemccand https://github.com/mikemccand would you ming having a quick
look at the usage of the Automaton API before it gets merged?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#20736 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AANLosvwqm1p8Ri61jHxnY4k7b3NVa8cks5qxCQlgaJpZM4KNs9q
.

filtered = XContentMapValues.filter(map, new String[0], new String[] {"a"});
expected = new HashMap<>(map);
expected.keySet().retainAll(Collections.singleton("d"));
assertEquals(expected, filtered);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test with strings longer than one char? Sadly, you probably also want on that has a character that is two java char wide as well. I'd be good to assert everything works right when you fail half way through the word - say you try and match cat against the pattern cbs. The c works, the a would put you in the -1 state, etc.

@jpountz
Copy link
Contributor Author

jpountz commented Oct 7, 2016

@nik9000 I pushed more tests as requested.

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I was worried about the contract of step. I figured you'd got it right but that didn't leave any defense against it changing out from under us.

@jpountz jpountz merged commit 8ab7ca5 into elastic:master Oct 10, 2016
@jpountz jpountz deleted the fix/source_include_dots_in_field_names branch October 10, 2016 07:32
jpountz added a commit that referenced this pull request Oct 10, 2016
…0736)

Mappings treat dots in field names as sub objects, for instance

```
{
  "a.b": "c"
}
```

generates the same dynamic mappings as

```
{
  "a": {
    "b": "c"
  }
}
```

Source filtering should be consistent with this behaviour so that an include
list containing `a` should include fields whose name is `a.b`.

To make this change easier, source filtering was refactored to use automata.
The ability to treat dots in field names as sub objects is provided by the
`makeMatchDotsInFieldNames` method of `XContentMapValues`.

Closes #20719
Copy link
Contributor

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woops, sorry I missed my calling here ;)

I left a few small comments.

// we want all sub properties to match as soon as an object matches

return filter(map,
include, include.getInitialState(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial state is always 0 w/ Lucene's automata ... I'll fix Lucene to make this clearer :)

CharacterRunAutomaton exclude = new CharacterRunAutomaton(excludeA);

// NOTE: We cannot use Operations.minus because of the special case that
// we want all sub properties to match as soon as an object matches
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if you could use Operations.minus, but then union that result with another automaton that accepts any string matched by the include automaton followed by . (literal) followed by any suffix? Then a single automaton could do all the matching I think?

* then {@code a.b} will be kept in the filtered map.
*/
public static Map<String, Object> filter(Map<String, ?> map, String[] includes, String[] excludes) {
CharacterRunAutomaton matchAllAutomaton = new CharacterRunAutomaton(Automata.makeAnyString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this invoked once per returned hit? We should maybe (separate change) compile the automaton/a up front, and re-use that across all the hits?

javanna added a commit that referenced this pull request May 13, 2020
When using source filtering exclusions, empty arrays are not preserved in documents, and no empty arrays are returned if arrays are empty after applying exclusions. We have special treatment to make sure that we preserve empty objects, but the behaviour for arrays is different.

It looks like this regression was introduced by #22593, shortly after we refactored source filtering to use automata (#20736).

Note that this change affects what the search API returns when using source exclusions, as well as what gets indexed when using source exclusions for the _source field.

Closes #23796
javanna added a commit to javanna/elasticsearch that referenced this pull request May 13, 2020
When using source filtering exclusions, empty arrays are not preserved in documents, and no empty arrays are returned if arrays are empty after applying exclusions. We have special treatment to make sure that we preserve empty objects, but the behaviour for arrays is different.

It looks like this regression was introduced by elastic#22593, shortly after we refactored source filtering to use automata (elastic#20736).

Note that this change affects what the search API returns when using source exclusions, as well as what gets indexed when using source exclusions for the _source field.

Closes elastic#23796
javanna added a commit that referenced this pull request May 13, 2020
When using source filtering exclusions, empty arrays are not preserved in documents, and no empty arrays are returned if arrays are empty after applying exclusions. We have special treatment to make sure that we preserve empty objects, but the behaviour for arrays is different.

It looks like this regression was introduced by #22593, shortly after we refactored source filtering to use automata (#20736).

Note that this change affects what the search API returns when using source exclusions, as well as what gets indexed when using source exclusions for the _source field.

Closes #23796
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants