-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix catalog plan for not query #139
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Complex topic, but a different plan for not queries is definitely missing here. That said, not queries are not always slow, but as far as I understand this is for the test only, to push not
to the end.
Disclaimer: I am not competent at all to judge on this PR, so do not expect a review from me :). The only thing I can tell is that I am not a super fan of mixing key types (in this case string and tuples). Maybe changing the key from Feel free to discard my comment. |
Good point ... but I've followed the same implementation (tuple with name and query) already used for |
if isinstance(query.get(name), dict) and "not" in query[name] | ||
] | ||
if notkeys: | ||
key = [name for name in key if name not in notkeys] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mamico If I'm reading https://github.com/zopefoundation/Products.ZCatalog/blob/master/src/Products/PluginIndexes/unindex.py#L433 correctly, I think it is possible to query a single index with both a "normal" (non-inverted) and "not" query at the same time; e.g. searchResults(indexname={"query": "x", "not": "y"})
.
The assumption made by this line of code is that an index is either performing a normal query or a not query, but not both, and I don't think that's a valid assumption in this edge case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe that's fine though. In that edge case we'll end up with ("indexname", "not") as the key -- so it'll be grouped together with queries that are only doing a not
query, but at least it'll be grouped together with another less common case for the purpose of catalog planning rather than being grouped with other queries that don't intersect with the not
results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, a catchall key for all operators could be something like
operatorkeys = [
name for name in key
if isinstance(query.get(name), dict)
]
if operatorkeys:
key = [name for name in key if name not in operatorkeys]
key.extend([(name, tuple(sorted(query[name].keys()) for name in operatorkeys])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mamico Interesting idea.
But since indexes are pluggable and each type of index can have arbitrary operators, I'm a little hesitant about making it this generic, since there are probably situations where the operator has a significant impact on the runtime of the query and other situations where it does not. Better to only handle not
at the moment, I think, so that we're only changing behavior of a specific case that we understand.
It also occurs to me that maybe determining the key for the queryplan should be delegated to the index implementation -- except that it's not necessarily trivial to find the index(es?) that corresponds to a particular name in the query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another case that is not handled here is if there is a query using the not
operator on an index that is being treated as a value index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another case that is not handled here is if there is a query using the
not
operator on an index that is being treated as a value index.
Nope, This is handled. Because those are string (with not and value included), not dict, after this https://github.com/zopefoundation/Products.ZCatalog/blob/master/src/Products/ZCatalog/plan.py#L246
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, I see what you mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so now that I've thought this through: I think there is more refinement that can be done on the code, but I also think it will not cause problems as currently implemented, and it helps with a specific case. So I'm okay with merging as is.
The
not
queries produce the same key as the normal queries in the catalog plan, this is a problem because thenot
queries could generally be slower.Before this PR, the queries
{'UID': '1', path: '/a'}
and{'UID': {'not': '1'}, path: '/a'}
share the same plan, with the key('UID', 'path')
, after they will have two distinct plan with the keys('UID', 'path')
and(('UID', 'not'), 'path')
.