Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix catalog plan for not query #139

Merged
merged 2 commits into from
Jul 26, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ Changelog
- Improve performance stability. Fix catalog plan for unused index in a query.
(`#138 <https://github.com/zopefoundation/Products.ZCatalog/pull/138>`_)

- Improve performance stability. Fix catalog plan for not query.
(`#139 <https://github.com/zopefoundation/Products.ZCatalog/pull/139>`_)


6.2 (2022-04-08)
----------------
Expand Down
8 changes: 7 additions & 1 deletion src/Products/ZCatalog/plan.py
Original file line number Diff line number Diff line change
Expand Up @@ -244,7 +244,13 @@ def make_key(self, query):
# repr() is an easy way to do this without imposing
# restrictions on the types of values.
key.append((name, repr(v)))

notkeys = [
name for name in key
if isinstance(query.get(name), dict) and "not" in query[name]
]
if notkeys:
key = [name for name in key if name not in notkeys]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mamico If I'm reading https://github.com/zopefoundation/Products.ZCatalog/blob/master/src/Products/PluginIndexes/unindex.py#L433 correctly, I think it is possible to query a single index with both a "normal" (non-inverted) and "not" query at the same time; e.g. searchResults(indexname={"query": "x", "not": "y"}).

The assumption made by this line of code is that an index is either performing a normal query or a not query, but not both, and I don't think that's a valid assumption in this edge case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe that's fine though. In that edge case we'll end up with ("indexname", "not") as the key -- so it'll be grouped together with queries that are only doing a not query, but at least it'll be grouped together with another less common case for the purpose of catalog planning rather than being grouped with other queries that don't intersect with the not results.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, a catchall key for all operators could be something like

        operatorkeys = [
            name for name in key
            if isinstance(query.get(name), dict)
        ]
        if operatorkeys:
            key = [name for name in key if name not in operatorkeys]
            key.extend([(name, tuple(sorted(query[name].keys()) for name in operatorkeys])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mamico Interesting idea.

But since indexes are pluggable and each type of index can have arbitrary operators, I'm a little hesitant about making it this generic, since there are probably situations where the operator has a significant impact on the runtime of the query and other situations where it does not. Better to only handle not at the moment, I think, so that we're only changing behavior of a specific case that we understand.

It also occurs to me that maybe determining the key for the queryplan should be delegated to the index implementation -- except that it's not necessarily trivial to find the index(es?) that corresponds to a particular name in the query.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another case that is not handled here is if there is a query using the not operator on an index that is being treated as a value index.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another case that is not handled here is if there is a query using the not operator on an index that is being treated as a value index.

Nope, This is handled. Because those are string (with not and value included), not dict, after this https://github.com/zopefoundation/Products.ZCatalog/blob/master/src/Products/ZCatalog/plan.py#L246

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, I see what you mean.

key.extend([(name, "not") for name in notkeys])
# Workaround: Python 2.x accepted different types as sort key
# for the sorted builtin. Python 3 only sorts on identical types.
tuple_keys = set(key) - set(
Expand Down
43 changes: 43 additions & 0 deletions src/Products/ZCatalog/tests/test_plan.py
Original file line number Diff line number Diff line change
Expand Up @@ -313,6 +313,49 @@ def query_index(self, record, resultset=None):
cat.getCatalogPlan(query2).plan(), ["numbers", "num", "date"]
)

def test_not_query(self):
# not query is generally slower, force this behavior for testing
class SlowNotFieldIndex(FieldIndex):
def query_index(self, record, resultset=None):
if getattr(record, 'not', None):
time.sleep(0.1)
return super(SlowNotFieldIndex, self).query_index(
record, resultset)

zcat = ZCatalog("catalog")
cat = zcat._catalog
cat.addIndex(
'num1', SlowNotFieldIndex('num1', extra={"indexed_attrs": "num"}))
cat.addIndex(
'num2', SlowNotFieldIndex('num2', extra={"indexed_attrs": "num"}))
for i in range(100):
obj = Dummy(i)
zcat.catalog_object(obj, str(i))

query1 = {"num1": {"not": 2}, "num2": 3}
query2 = {"num1": 2, "num2": {'not': 5}}

# without a plan index are orderd alphabetically by default
for query in [query1, query2]:
self.assertEqual(zcat._catalog.getCatalogPlan(query).plan(), None)
self.assertEqual(
cat._sorted_search_indexes(query),
["num1", "num2"]
)

self.assertEqual([b.getPath() for b in zcat.search(query1)], ['3'])
self.assertEqual([b.getPath() for b in zcat.search(query2)], ['2'])
# although there are the same fields, the plans are different, and the
# slower `not` query put the field as second in the plan
self.assertEqual(cat.getCatalogPlan(query1).plan(), ["num2", "num1"])
self.assertEqual(cat.getCatalogPlan(query2).plan(), ["num1", "num2"])

# search again doesn't change the order
self.assertEqual([b.getPath() for b in zcat.search(query1)], ['3'])
self.assertEqual([b.getPath() for b in zcat.search(query2)], ['2'])
self.assertEqual(cat.getCatalogPlan(query1).plan(), ["num2", "num1"])
self.assertEqual(cat.getCatalogPlan(query2).plan(), ["num1", "num2"])

def test_plan_empty(self):
plan = self._makeOne()
self.assertEqual(plan.plan(), None)
Expand Down