-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Source filtering should treat dots in field names as sub objects. #20736
Source filtering should treat dots in field names as sub objects. #20736
Conversation
Mappings treat dots in field names as sub objects, for instance ``` { "a.b": "c" } ``` generates the same dynamic mappings as ``` { "a": { "b": "c" } } ``` Source filtering should be consistent with this behaviour so that an include list containing `a` should include fields whose name is `a.b`. To make this change easier, source filtering was refactored to use automata. The ability to treat dots in field names as sub objects is provided by the `makeMatchDotsInFieldNames` method of `XContentMapValues`. Closes elastic#20719
+1 LGTM, Thanks @jpountz |
@mikemccand would you ming having a quick look at the usage of the Automaton API before it gets merged? |
If Mike doesn't get to it i can look again in the morning. I looked earlier On Oct 5, 2016 6:26 PM, "Adrien Grand" [email protected] wrote:
|
filtered = XContentMapValues.filter(map, new String[0], new String[] {"a"}); | ||
expected = new HashMap<>(map); | ||
expected.keySet().retainAll(Collections.singleton("d")); | ||
assertEquals(expected, filtered); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test with strings longer than one char? Sadly, you probably also want on that has a character that is two java char
wide as well. I'd be good to assert everything works right when you fail half way through the word - say you try and match cat
against the pattern cbs
. The c
works, the a
would put you in the -1
state, etc.
@nik9000 I pushed more tests as requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I was worried about the contract of step. I figured you'd got it right but that didn't leave any defense against it changing out from under us.
…0736) Mappings treat dots in field names as sub objects, for instance ``` { "a.b": "c" } ``` generates the same dynamic mappings as ``` { "a": { "b": "c" } } ``` Source filtering should be consistent with this behaviour so that an include list containing `a` should include fields whose name is `a.b`. To make this change easier, source filtering was refactored to use automata. The ability to treat dots in field names as sub objects is provided by the `makeMatchDotsInFieldNames` method of `XContentMapValues`. Closes #20719
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woops, sorry I missed my calling here ;)
I left a few small comments.
// we want all sub properties to match as soon as an object matches | ||
|
||
return filter(map, | ||
include, include.getInitialState(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial state is always 0 w/ Lucene's automata ... I'll fix Lucene to make this clearer :)
CharacterRunAutomaton exclude = new CharacterRunAutomaton(excludeA); | ||
|
||
// NOTE: We cannot use Operations.minus because of the special case that | ||
// we want all sub properties to match as soon as an object matches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if you could use Operations.minus
, but then union that result with another automaton that accepts any string matched by the include automaton followed by .
(literal) followed by any suffix? Then a single automaton could do all the matching I think?
* then {@code a.b} will be kept in the filtered map. | ||
*/ | ||
public static Map<String, Object> filter(Map<String, ?> map, String[] includes, String[] excludes) { | ||
CharacterRunAutomaton matchAllAutomaton = new CharacterRunAutomaton(Automata.makeAnyString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this invoked once per returned hit? We should maybe (separate change) compile the automaton/a up front, and re-use that across all the hits?
When using source filtering exclusions, empty arrays are not preserved in documents, and no empty arrays are returned if arrays are empty after applying exclusions. We have special treatment to make sure that we preserve empty objects, but the behaviour for arrays is different. It looks like this regression was introduced by #22593, shortly after we refactored source filtering to use automata (#20736). Note that this change affects what the search API returns when using source exclusions, as well as what gets indexed when using source exclusions for the _source field. Closes #23796
When using source filtering exclusions, empty arrays are not preserved in documents, and no empty arrays are returned if arrays are empty after applying exclusions. We have special treatment to make sure that we preserve empty objects, but the behaviour for arrays is different. It looks like this regression was introduced by elastic#22593, shortly after we refactored source filtering to use automata (elastic#20736). Note that this change affects what the search API returns when using source exclusions, as well as what gets indexed when using source exclusions for the _source field. Closes elastic#23796
When using source filtering exclusions, empty arrays are not preserved in documents, and no empty arrays are returned if arrays are empty after applying exclusions. We have special treatment to make sure that we preserve empty objects, but the behaviour for arrays is different. It looks like this regression was introduced by #22593, shortly after we refactored source filtering to use automata (#20736). Note that this change affects what the search API returns when using source exclusions, as well as what gets indexed when using source exclusions for the _source field. Closes #23796
Mappings treat dots in field names as sub objects, for instance
generates the same dynamic mappings as
Source filtering should be consistent with this behaviour so that an include
list containing
a
should include fields whose name isa.b
.To make this change easier, source filtering was refactored to use automata.
The ability to treat dots in field names as sub objects is provided by the
makeMatchDotsInFieldNames
method ofXContentMapValues
.Closes #20719