Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix catalog plan for not query #139

Merged
merged 2 commits into from
Jul 26, 2022
Merged

Fix catalog plan for not query #139

merged 2 commits into from
Jul 26, 2022

Conversation

mamico
Copy link
Member

@mamico mamico commented Jul 22, 2022

The not queries produce the same key as the normal queries in the catalog plan, this is a problem because the not queries could generally be slower.

Before this PR, the queries {'UID': '1', path: '/a'} and {'UID': {'not': '1'}, path: '/a'} share the same plan, with the key ('UID', 'path'), after they will have two distinct plan with the keys ('UID', 'path') and (('UID', 'not'), 'path').

@mamico mamico requested review from davisagli, jensens and ale-rt July 22, 2022 07:46
Copy link
Member

@jensens jensens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complex topic, but a different plan for not queries is definitely missing here. That said, not queries are not always slow, but as far as I understand this is for the test only, to push not to the end.

@ale-rt
Copy link
Member

ale-rt commented Jul 22, 2022

Disclaimer: I am not competent at all to judge on this PR, so do not expect a review from me :).

The only thing I can tell is that I am not a super fan of mixing key types (in this case string and tuples).
It might backfire for whatever reason.

Maybe changing the key from ("UID", "not") to something like "!UID", "UID:not", "UID|not" or whatever might be better.
I would not pick the $NAME_not option because I am pretty sure in some portal catalog in this world there is an index whose name ends with _not ;)

Feel free to discard my comment.

@mamico
Copy link
Member Author

mamico commented Jul 22, 2022

The only thing I can tell is that I am not a super fan of mixing key types (in this case string and tuples). It might backfire for whatever reason.

Good point ... but I've followed the same implementation (tuple with name and query) already used for valueindexes see https://github.com/zopefoundation/Products.ZCatalog/blob/master/src/Products/ZCatalog/plan.py#L246

if isinstance(query.get(name), dict) and "not" in query[name]
]
if notkeys:
key = [name for name in key if name not in notkeys]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mamico If I'm reading https://github.com/zopefoundation/Products.ZCatalog/blob/master/src/Products/PluginIndexes/unindex.py#L433 correctly, I think it is possible to query a single index with both a "normal" (non-inverted) and "not" query at the same time; e.g. searchResults(indexname={"query": "x", "not": "y"}).

The assumption made by this line of code is that an index is either performing a normal query or a not query, but not both, and I don't think that's a valid assumption in this edge case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe that's fine though. In that edge case we'll end up with ("indexname", "not") as the key -- so it'll be grouped together with queries that are only doing a not query, but at least it'll be grouped together with another less common case for the purpose of catalog planning rather than being grouped with other queries that don't intersect with the not results.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, a catchall key for all operators could be something like

        operatorkeys = [
            name for name in key
            if isinstance(query.get(name), dict)
        ]
        if operatorkeys:
            key = [name for name in key if name not in operatorkeys]
            key.extend([(name, tuple(sorted(query[name].keys()) for name in operatorkeys])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mamico Interesting idea.

But since indexes are pluggable and each type of index can have arbitrary operators, I'm a little hesitant about making it this generic, since there are probably situations where the operator has a significant impact on the runtime of the query and other situations where it does not. Better to only handle not at the moment, I think, so that we're only changing behavior of a specific case that we understand.

It also occurs to me that maybe determining the key for the queryplan should be delegated to the index implementation -- except that it's not necessarily trivial to find the index(es?) that corresponds to a particular name in the query.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another case that is not handled here is if there is a query using the not operator on an index that is being treated as a value index.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another case that is not handled here is if there is a query using the not operator on an index that is being treated as a value index.

Nope, This is handled. Because those are string (with not and value included), not dict, after this https://github.com/zopefoundation/Products.ZCatalog/blob/master/src/Products/ZCatalog/plan.py#L246

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, I see what you mean.

Copy link
Member

@davisagli davisagli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so now that I've thought this through: I think there is more refinement that can be done on the code, but I also think it will not cause problems as currently implemented, and it helps with a specific case. So I'm okay with merging as is.

@davisagli davisagli merged commit f9dce82 into master Jul 26, 2022
@davisagli davisagli deleted the mamico/plan_not_query branch July 26, 2022 04:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants