Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mask wildcard query special characters on keyword queries (#53127) #53512

Merged
merged 4 commits into from
Mar 24, 2020

Conversation

cbuescher
Copy link
Member

Wildcard queries on keyword fields get normalized, however this normalization
step should exclude the two special characters * and ? in order to keep the
wildcard query itself intact.

Backport of #53127

Changes wrt. the commit on master: also changed TypeFieldType to extend ConstantFieldType, but we still need to support matching "_type" values other than "_doc" on 7.x. Also there were tests checking that range queries on types work that are not there anymore on master, but this still seems to work on 7.x so I left the range implementation in TypeFieldType but had to adapt it slightly after changing the supertype.

)

Wildcard queries on keyword fields get normalized, however this normalization
step should exclude the two special characters * and ? in order to keep the
wildcard query itself intact.

Closes elastic#46300
@cbuescher cbuescher added :Search/Search Search-related issues that do not fall into other categories backport v7.7.0 v7.6.2 labels Mar 12, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

WildcardQueryBuilder builder = QueryBuilders.wildcardQuery("_type", "doc*");
builder.doToQuery(createShardContext());
assertWarnings(QueryShardContext.TYPES_DEPRECATION_MESSAGE);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this test for wildcard queries on "_type" fields. These don't return anything useful at the moment already, however before the change in this PR we still got an empty response while now we get an error because "_type" is not indexed. I don't know if we can tolerate this, otherwise I need to add a very simple wildcard implementation to TypeFieldType again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following, _type should still be indexed in 7x? It's only in 8x that it will be gone as a field.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_type should still be indexed in 7x

Thats true, the test above only checks for the deprecation message though. Currently if you do a wildcard query on a "_type" field, it returns no hits (even for exact values like "value": "_doc"). Thats why I think nobody should really do that kind of query on a "_type" field. With the changes in this PR we would error in those cases though. If you want I can try to work around that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Erroring is better than silent weirdness, let's leave it as is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant to write "except for exact values like "_doc" above. So that works on 7.x, but I hardly believe anybody relies on that? If you change your mind after this clarification let me know.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… starting to think better safe than sorry, I think its a small addition that will keep the existing (weird) behaviour
I’ll look into that for a bit, leave it if it gets too ugly

@cbuescher
Copy link
Member Author

@jimczi @romseygeek this is the backport from #53127, no real changes to the implementation in StringFieldType that do the actual wildcard-related work, but I had to revert the changes for the TypeFieldType, going back to it extending StringFieldType because otherwise I had several errors with nested fields etc... that were hard to work around. I removed on test I commented on above, not sure if thats acceptable, otherwise I can add a simple wildcard impl. to TypeFieldType that at least won't throw an error. Wdyt?

@romseygeek
Copy link
Contributor

but I had to revert the changes for the TypeFieldType, going back to it extending StringFieldType

Yes, in 7x the _type field can have multiple values because it is overloaded for use by nested fields - it's only in 8x that it's a constant.

@markharwood
Copy link
Contributor

What's the latest with this PR? I'm dependent on this to backport #53851 for my wildcard field.
In master I refactored the StringFieldType wildcard-preservation logic so that it was a static function I could also call from WildcardFieldType. Right now 7.x has no wildcard preservation logic at all. I can always remove the dependency on StringFieldType and copy the logic into WildcardFieldMapper if it looks like this is stalled.

@cbuescher
Copy link
Member Author

@romseygeek @markharwood sorry for dropping the ball here, I left a comment around the removed tests, explaining why I think throwing an error there now wouldn't matter that much. Let me know if you would like me to change anything around that.

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @cbuescher

@cbuescher cbuescher merged commit 1c1730f into elastic:7.x Mar 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport :Search/Search Search-related issues that do not fall into other categories v7.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants