Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify OR/Union type IndexMerge code paths to expand the applicable scenarios of existing capabilities #58361

Closed
time-and-fate opened this issue Dec 17, 2024 · 4 comments
Assignees
Labels
sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement.

Comments

@time-and-fate
Copy link
Member

time-and-fate commented Dec 17, 2024

Background & Problem

During the development of several previous features, the logic of generating OR/Union type IndexMerge paths has diverged into several different code paths:

  1. The code path evolved from the original IndexMerge code
    • It only handles non-MV indexes.
    • When there are multiple indexes to choose from, it can consider the required Sort property and choose the correct index to satisfy that.
    • Now the entry is generateIndexMerge4NormalIndex().
  2. The code path introduced to support the MV index
    • It's specially for handling the MV index.
    • It can generate IndexMerge paths that involve only one MV index.
    • For filters like ... AND (... OR ...) AND ..., it can consider the filters connected by AND so the user doesn't need to expand the expression manually to match the index.
    • Now the entry is generateIndexMerge4MVIndex()
  3. The code path later introduced to support more complex patterns involving the MV index.
    • It's mostly similar to 2, but it can generate IndexMerge paths involving multiple MV and non-MV indexes.
    • Now the entry is generateIndexMerge4ComposedIndex().
Links to some previous features

Apparently, it causes code repetitions. If you go over the code, you'll find there are many similar code snippets and there are also many structural similarities.

Besides, for some later enhancements, we introduced them to only 1 or 2 code paths. They can't benefit the other code paths.

For example, for the non-MV index, sometimes the user needs to expand the ... AND (... OR ...) filter manually to match indexes so that the optimizer can choose an IndexMerge plan.

create table t(a int, b int, c int, d int,
index iab(a,b),
index iac(a,c),
index iad(a,d));
explain select /*+ use_index_merge(t) */ * from t where a = 1 and (b = 2 or c = 3 or d = 4);
show warnings;
explain select /*+ use_index_merge(t) */ * from t where (a = 1 and b = 2) or (a = 1 and c = 3) or (a = 1 and d = 4);
> explain select /*+ use_index_merge(t) */ * from t where a = 1 and (b = 2 or c = 3 or d = 4);
+-------------------------------+---------+-----------+--------------------------+-----------------------------------------------------------+
| id                            | estRows | task      | access object            | operator info                                             |
+-------------------------------+---------+-----------+--------------------------+-----------------------------------------------------------+
| IndexLookUp_8                 | 0.03    | root      |                          |                                                           |
| ├─IndexRangeScan_5(Build)     | 10.00   | cop[tikv] | table:t, index:iab(a, b) | range:[1,1], keep order:false, stats:pseudo               |
| └─Selection_7(Probe)          | 0.03    | cop[tikv] |                          | or(eq(test.t.b, 2), or(eq(test.t.c, 3), eq(test.t.d, 4))) |
|   └─TableRowIDScan_6          | 10.00   | cop[tikv] | table:t                  | keep order:false, stats:pseudo                            |
+-------------------------------+---------+-----------+--------------------------+-----------------------------------------------------------+

> show warnings;
+---------+------+----------------------------+
| Level   | Code | Message                    |
+---------+------+----------------------------+
| Warning | 1105 | IndexMerge is inapplicable |
+---------+------+----------------------------+

> explain select /*+ use_index_merge(t) */ * from t where (a = 1 and b = 2) or (a = 1 and c = 3) or (a = 1 and d = 4);
+-------------------------------+---------+-----------+--------------------------+-------------------------------------------------+
| id                            | estRows | task      | access object            | operator info                                   |
+-------------------------------+---------+-----------+--------------------------+-------------------------------------------------+
| IndexMerge_9                  | 8.00    | root      |                          | type: union                                     |
| ├─IndexRangeScan_5(Build)     | 0.10    | cop[tikv] | table:t, index:iab(a, b) | range:[1 2,1 2], keep order:false, stats:pseudo |
| ├─IndexRangeScan_6(Build)     | 0.10    | cop[tikv] | table:t, index:iac(a, c) | range:[1 3,1 3], keep order:false, stats:pseudo |
| ├─IndexRangeScan_7(Build)     | 0.10    | cop[tikv] | table:t, index:iad(a, d) | range:[1 4,1 4], keep order:false, stats:pseudo |
| └─TableRowIDScan_8(Probe)     | 8.00    | cop[tikv] | table:t                  | keep order:false, stats:pseudo                  |
+-------------------------------+---------+-----------+--------------------------+-------------------------------------------------+

Another example:
The non-MV index logic can consider and try to satisfy the required Sort property, but the MV index logic can't. So sometimes it will need an extra Sort. The performance difference will be more obvious when there's also LIMIT in the SQL.

create table t ( a int, b int, c int, j1 json, j2 json,
key mvi1((cast(j1 as unsigned array)), a),
key mvi2((cast(j1 as unsigned array)), b),
key mvi3((cast(j2 as unsigned array)), a),
key mvi4((cast(j2 as unsigned array)), b) );
explain select * from t where 1 member of (j1) or 2 member of (j2) order by a;
explain select * from t where 1 member of (j1) or 2 member of (j2) order by b;
> explain select * from t where 1 member of (j1) or 2 member of (j2) order by a;
+----------------------------------+---------+-----------+------------------------------------------------------+---------------------------------------------------------+
| id                               | estRows | task      | access object                                        | operator info                                           |
+----------------------------------+---------+-----------+------------------------------------------------------+---------------------------------------------------------+
| Projection_20                    | 19.99   | root      |                                                      | test3.t.a, test3.t.b, test3.t.c, test3.t.j1, test3.t.j2 |
| └─IndexMerge_19                  | 19.99   | root      |                                                      | type: union                                             |
|   ├─IndexRangeScan_16(Build)     | 10.00   | cop[tikv] | table:t, index:mvi1(cast(`j1` as unsigned array), a) | range:[1,1], keep order:true, stats:pseudo              |
|   ├─IndexRangeScan_17(Build)     | 10.00   | cop[tikv] | table:t, index:mvi3(cast(`j2` as unsigned array), a) | range:[2,2], keep order:true, stats:pseudo              |
|   └─TableRowIDScan_18(Probe)     | 19.99   | cop[tikv] | table:t                                              | keep order:false, stats:pseudo                          |
+----------------------------------+---------+-----------+------------------------------------------------------+---------------------------------------------------------+

> explain select * from t where 1 member of (j1) or 2 member of (j2) order by b;
+----------------------------------+---------+-----------+------------------------------------------------------+---------------------------------------------+
| id                               | estRows | task      | access object                                        | operator info                               |
+----------------------------------+---------+-----------+------------------------------------------------------+---------------------------------------------+
| Sort_7                           | 19.99   | root      |                                                      | test3.t.b                                   |
| └─IndexMerge_15                  | 19.99   | root      |                                                      | type: union                                 |
|   ├─IndexRangeScan_12(Build)     | 10.00   | cop[tikv] | table:t, index:mvi1(cast(`j1` as unsigned array), a) | range:[1,1], keep order:false, stats:pseudo |
|   ├─IndexRangeScan_13(Build)     | 10.00   | cop[tikv] | table:t, index:mvi3(cast(`j2` as unsigned array), a) | range:[2,2], keep order:false, stats:pseudo |
|   └─TableRowIDScan_14(Probe)     | 19.99   | cop[tikv] | table:t                                              | keep order:false, stats:pseudo              |
+----------------------------------+---------+-----------+------------------------------------------------------+---------------------------------------------+

Design

Recently, we got a feature request that needs us to apply the enhancement in #51778 to non-MV indexes.
This time, we want to unify the 3 code paths and make the fundamental capabilities available for all scenarios, and of course, reduce the code repetition.

Current implementation

A key data structure here is the AccessPath. When an AccessPath represents an IndexMerge path, there are two cases:

  1. When AccessPath.PartialIndexPaths []*AccessPath is used, each element in this slice directly corresponds to a "partial path" in the final plan. Code paths 2 and 3 use this.
  2. When AccessPath.PartialAlternativeIndexPaths [][]*AccessPath is used, all possible alternatives for a "partial path" are stored. Then it can try to choose the one that satisfies the Sort property in findBestTask() (the required Sort property is only available at that time). Code path 1 uses this.

Using the AccessPath as the dividing line, we can roughly divide the overall process of generating OR type IndexMerge path into two steps:

  1. In generateIndexMergePath(), it will match the filters against candidate indexes, try to find possible IndexMerge paths, and store them in the AccessPath.
    • Code path 1 uses an old logic, which remains mostly unchanged since the first implementation of IndexMerge. It tries to find an OR list in the input filters and only uses it to build an IndexMerge path. In the ... AND (... OR ...) AND ... example above, it won't try to use other filters connected by AND.
    • After Enable optimizer to generate index merge path for mv index + OR list nested in AND list #51778, code paths 2 and 3 use a new implementation that tries to collect usable filters from the top level AND list.
  2. In findBestTask():
    • For the AccessPath.PartialIndexPaths case, it just directly builds into the final plan.
    • For the AccessPath.PartialAlternativeIndexPaths case, there is an extra step (in matchPropForIndexMergeAlternatives()) that chooses the alternatives that satisfy the Sort property, then builds into a normal AccessPath with AccessPath.PartialIndexPaths set.

Changes

  • All "alternative" in the code should be a []*AccessPath
    • A difference between the non-MV index and the MV index is that a usable filter on the MV index may build into several paths. For example, json_overlaps(j, '[1,2]') on index(cast j to unsigned array) will become two paths.
    • AccessPath.PartialAlternativeIndexPaths is the most obvious one. It will be changed from [][]*AccessPath to [][][]*AccessPath.
  • In findBestTask() -> matchPropForIndexMergeAlternatives(), it will need to be able to handle MV indexes.
    • When matching the Sort property, we should check a slice of *AccessPath.
    • Some logic in an earlier stage in the current implementation will be moved here.
  • In "step 1" in the current implementation above, the new implementation used by code paths 2 and 3 will become the unified code path.
    • To achieve this, it needs to store all alternatives in the AccessPath, i.e., switching from AccessPath.PartialIndexPaths to AccessPath.PartialAlternativeIndexPaths.
    • Accordingly, some logic here in the current implementation that decides on one of the alternatives will be moved to the later stage in matchPropForIndexMergeAlternatives().
    • Since it already has the ability to handle non-MV indexes, we don't need to do extra work to merge the code path 1 into this.
  • Some other minor considerations:
    • In the current implementation, there are some rules that apply to IndexMerge plans that only involve non-MV indexes. For example, we never generate an IndexMerge plan where every partial path uses the same index; if it can generate a range scan on a single index, it will not automatically use index merge. These rules should not apply to MV indexes, and we also want to keep the existing behavior unchanged. So we will add a check to determine "if this plan only involves non-MV indexes" for these rules.
    • Although it will be the same logic most of the time, for some places, if the plan contains MV-indexes, the implementation will be slightly different. For example, it will use CalcTotalSelectivityForMVIdxPath() for estimation, but the non-MV indexes case uses Selectivity().
  • As the last step, the 3 code paths should be merged into 1. And there will also be some code/comments cleanup.
@AilinKid
Copy link
Contributor

AilinKid commented Dec 18, 2024

very appreciate @time-and-fate could summarize a detailed list about what/how/why of current code base is from several previous history commits which require familiarity. The refactoring means much more than the a feature request itself which needs us to apply the enhancement in #51778 to non-MV indexes.

@Rustin170506
Copy link
Member

Rustin170506 commented Dec 20, 2024

into two steps:

I guess it is three steps.

@time-and-fate
Copy link
Member Author

into two steps:

I guess it is three steps.

Haha, you are right. The AccessPath part is added later and I didn't notice that.
I rephrase that part and it should be correct now.

@time-and-fate
Copy link
Member Author

Split into 4 PRs:
#58332
#58397
#58448
#58396
Now they all merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

3 participants