Unify OR/Union type IndexMerge code paths to expand the applicable scenarios of existing capabilities #58361

time-and-fate · 2024-12-17T17:55:00Z

Background & Problem

During the development of several previous features, the logic of generating OR/Union type IndexMerge paths has diverged into several different code paths:

The code path evolved from the original IndexMerge code
- It only handles non-MV indexes.
- When there are multiple indexes to choose from, it can consider the required Sort property and choose the correct index to satisfy that.
- Now the entry is generateIndexMerge4NormalIndex().
The code path introduced to support the MV index
- It's specially for handling the MV index.
- It can generate IndexMerge paths that involve only one MV index.
- For filters like ... AND (... OR ...) AND ..., it can consider the filters connected by AND so the user doesn't need to expand the expression manually to match the index.
- Now the entry is generateIndexMerge4MVIndex()
The code path later introduced to support more complex patterns involving the MV index.
- It's mostly similar to 2, but it can generate IndexMerge paths involving multiple MV and non-MV indexes.
- Now the entry is generateIndexMerge4ComposedIndex().

Links to some previous features

Initial support for the MV index (the code path 2): planner: support converting json_memberof/overlaps/contains to IndexMerge to access MVIndex #40191
Support using different MV indexes and non-MV indexes in an IndexMerge path (the code path 3): Implement index merge from composited CNF and DNF condition #47759
Support satisfying Sort property for non-MV indexes (enhancement on code path 1): planner: the optimizer cannot choose the best IndexMerge plan #48359
Support collecting the top level AND list for filters like `... AND (... OR ...) ...` for paths involving MV indexes (enhancement on code path 2 and 3): Enable optimizer to generate index merge path for mv index + OR list nested in AND list #51778

Apparently, it causes code repetitions. If you go over the code, you'll find there are many similar code snippets and there are also many structural similarities.

Besides, for some later enhancements, we introduced them to only 1 or 2 code paths. They can't benefit the other code paths.

For example, for the non-MV index, sometimes the user needs to expand the ... AND (... OR ...) filter manually to match indexes so that the optimizer can choose an IndexMerge plan.

create table t(a int, b int, c int, d int,
index iab(a,b),
index iac(a,c),
index iad(a,d));
explain select /*+ use_index_merge(t) */ * from t where a = 1 and (b = 2 or c = 3 or d = 4);
show warnings;
explain select /*+ use_index_merge(t) */ * from t where (a = 1 and b = 2) or (a = 1 and c = 3) or (a = 1 and d = 4);

> explain select /*+ use_index_merge(t) */ * from t where a = 1 and (b = 2 or c = 3 or d = 4);
+-------------------------------+---------+-----------+--------------------------+-----------------------------------------------------------+
| id                            | estRows | task      | access object            | operator info                                             |
+-------------------------------+---------+-----------+--------------------------+-----------------------------------------------------------+
| IndexLookUp_8                 | 0.03    | root      |                          |                                                           |
| ├─IndexRangeScan_5(Build)     | 10.00   | cop[tikv] | table:t, index:iab(a, b) | range:[1,1], keep order:false, stats:pseudo               |
| └─Selection_7(Probe)          | 0.03    | cop[tikv] |                          | or(eq(test.t.b, 2), or(eq(test.t.c, 3), eq(test.t.d, 4))) |
|   └─TableRowIDScan_6          | 10.00   | cop[tikv] | table:t                  | keep order:false, stats:pseudo                            |
+-------------------------------+---------+-----------+--------------------------+-----------------------------------------------------------+

> show warnings;
+---------+------+----------------------------+
| Level   | Code | Message                    |
+---------+------+----------------------------+
| Warning | 1105 | IndexMerge is inapplicable |
+---------+------+----------------------------+

> explain select /*+ use_index_merge(t) */ * from t where (a = 1 and b = 2) or (a = 1 and c = 3) or (a = 1 and d = 4);
+-------------------------------+---------+-----------+--------------------------+-------------------------------------------------+
| id                            | estRows | task      | access object            | operator info                                   |
+-------------------------------+---------+-----------+--------------------------+-------------------------------------------------+
| IndexMerge_9                  | 8.00    | root      |                          | type: union                                     |
| ├─IndexRangeScan_5(Build)     | 0.10    | cop[tikv] | table:t, index:iab(a, b) | range:[1 2,1 2], keep order:false, stats:pseudo |
| ├─IndexRangeScan_6(Build)     | 0.10    | cop[tikv] | table:t, index:iac(a, c) | range:[1 3,1 3], keep order:false, stats:pseudo |
| ├─IndexRangeScan_7(Build)     | 0.10    | cop[tikv] | table:t, index:iad(a, d) | range:[1 4,1 4], keep order:false, stats:pseudo |
| └─TableRowIDScan_8(Probe)     | 8.00    | cop[tikv] | table:t                  | keep order:false, stats:pseudo                  |
+-------------------------------+---------+-----------+--------------------------+-------------------------------------------------+

Another example:
The non-MV index logic can consider and try to satisfy the required Sort property, but the MV index logic can't. So sometimes it will need an extra Sort. The performance difference will be more obvious when there's also LIMIT in the SQL.

create table t ( a int, b int, c int, j1 json, j2 json,
key mvi1((cast(j1 as unsigned array)), a),
key mvi2((cast(j1 as unsigned array)), b),
key mvi3((cast(j2 as unsigned array)), a),
key mvi4((cast(j2 as unsigned array)), b) );
explain select * from t where 1 member of (j1) or 2 member of (j2) order by a;
explain select * from t where 1 member of (j1) or 2 member of (j2) order by b;

> explain select * from t where 1 member of (j1) or 2 member of (j2) order by a;
+----------------------------------+---------+-----------+------------------------------------------------------+---------------------------------------------------------+
| id                               | estRows | task      | access object                                        | operator info                                           |
+----------------------------------+---------+-----------+------------------------------------------------------+---------------------------------------------------------+
| Projection_20                    | 19.99   | root      |                                                      | test3.t.a, test3.t.b, test3.t.c, test3.t.j1, test3.t.j2 |
| └─IndexMerge_19                  | 19.99   | root      |                                                      | type: union                                             |
|   ├─IndexRangeScan_16(Build)     | 10.00   | cop[tikv] | table:t, index:mvi1(cast(`j1` as unsigned array), a) | range:[1,1], keep order:true, stats:pseudo              |
|   ├─IndexRangeScan_17(Build)     | 10.00   | cop[tikv] | table:t, index:mvi3(cast(`j2` as unsigned array), a) | range:[2,2], keep order:true, stats:pseudo              |
|   └─TableRowIDScan_18(Probe)     | 19.99   | cop[tikv] | table:t                                              | keep order:false, stats:pseudo                          |
+----------------------------------+---------+-----------+------------------------------------------------------+---------------------------------------------------------+

> explain select * from t where 1 member of (j1) or 2 member of (j2) order by b;
+----------------------------------+---------+-----------+------------------------------------------------------+---------------------------------------------+
| id                               | estRows | task      | access object                                        | operator info                               |
+----------------------------------+---------+-----------+------------------------------------------------------+---------------------------------------------+
| Sort_7                           | 19.99   | root      |                                                      | test3.t.b                                   |
| └─IndexMerge_15                  | 19.99   | root      |                                                      | type: union                                 |
|   ├─IndexRangeScan_12(Build)     | 10.00   | cop[tikv] | table:t, index:mvi1(cast(`j1` as unsigned array), a) | range:[1,1], keep order:false, stats:pseudo |
|   ├─IndexRangeScan_13(Build)     | 10.00   | cop[tikv] | table:t, index:mvi3(cast(`j2` as unsigned array), a) | range:[2,2], keep order:false, stats:pseudo |
|   └─TableRowIDScan_14(Probe)     | 19.99   | cop[tikv] | table:t                                              | keep order:false, stats:pseudo              |
+----------------------------------+---------+-----------+------------------------------------------------------+---------------------------------------------+

Design

Recently, we got a feature request that needs us to apply the enhancement in #51778 to non-MV indexes.
This time, we want to unify the 3 code paths and make the fundamental capabilities available for all scenarios, and of course, reduce the code repetition.

Current implementation

A key data structure here is the AccessPath. When an AccessPath represents an IndexMerge path, there are two cases:

When AccessPath.PartialIndexPaths []*AccessPath is used, each element in this slice directly corresponds to a "partial path" in the final plan. Code paths 2 and 3 use this.
When AccessPath.PartialAlternativeIndexPaths [][]*AccessPath is used, all possible alternatives for a "partial path" are stored. Then it can try to choose the one that satisfies the Sort property in findBestTask() (the required Sort property is only available at that time). Code path 1 uses this.

Using the AccessPath as the dividing line, we can roughly divide the overall process of generating OR type IndexMerge path into two steps:

In generateIndexMergePath(), it will match the filters against candidate indexes, try to find possible IndexMerge paths, and store them in the AccessPath.
- Code path 1 uses an old logic, which remains mostly unchanged since the first implementation of IndexMerge. It tries to find an OR list in the input filters and only uses it to build an IndexMerge path. In the ... AND (... OR ...) AND ... example above, it won't try to use other filters connected by AND.
- After Enable optimizer to generate index merge path for mv index + OR list nested in AND list #51778, code paths 2 and 3 use a new implementation that tries to collect usable filters from the top level AND list.
In findBestTask():
- For the AccessPath.PartialIndexPaths case, it just directly builds into the final plan.
- For the AccessPath.PartialAlternativeIndexPaths case, there is an extra step (in matchPropForIndexMergeAlternatives()) that chooses the alternatives that satisfy the Sort property, then builds into a normal AccessPath with AccessPath.PartialIndexPaths set.

Changes

All "alternative" in the code should be a []*AccessPath
- A difference between the non-MV index and the MV index is that a usable filter on the MV index may build into several paths. For example, json_overlaps(j, '[1,2]') on index(cast j to unsigned array) will become two paths.
- AccessPath.PartialAlternativeIndexPaths is the most obvious one. It will be changed from [][]*AccessPath to [][][]*AccessPath.
In findBestTask() -> matchPropForIndexMergeAlternatives(), it will need to be able to handle MV indexes.
- When matching the Sort property, we should check a slice of *AccessPath.
- Some logic in an earlier stage in the current implementation will be moved here.
In "step 1" in the current implementation above, the new implementation used by code paths 2 and 3 will become the unified code path.
- To achieve this, it needs to store all alternatives in the AccessPath, i.e., switching from AccessPath.PartialIndexPaths to AccessPath.PartialAlternativeIndexPaths.
- Accordingly, some logic here in the current implementation that decides on one of the alternatives will be moved to the later stage in matchPropForIndexMergeAlternatives().
- Since it already has the ability to handle non-MV indexes, we don't need to do extra work to merge the code path 1 into this.
Some other minor considerations:
- In the current implementation, there are some rules that apply to IndexMerge plans that only involve non-MV indexes. For example, we never generate an IndexMerge plan where every partial path uses the same index; if it can generate a range scan on a single index, it will not automatically use index merge. These rules should not apply to MV indexes, and we also want to keep the existing behavior unchanged. So we will add a check to determine "if this plan only involves non-MV indexes" for these rules.
- Although it will be the same logic most of the time, for some places, if the plan contains MV-indexes, the implementation will be slightly different. For example, it will use CalcTotalSelectivityForMVIdxPath() for estimation, but the non-MV indexes case uses Selectivity().
As the last step, the 3 code paths should be merged into 1. And there will also be some code/comments cleanup.

The text was updated successfully, but these errors were encountered:

AilinKid · 2024-12-18T06:35:27Z

very appreciate @time-and-fate could summarize a detailed list about what/how/why of current code base is from several previous history commits which require familiarity. The refactoring means much more than the a feature request itself which needs us to apply the enhancement in #51778 to non-MV indexes.

…e logic (#58332) ref #58361

Rustin170506 · 2024-12-20T06:42:22Z

into two steps:

I guess it is three steps.

time-and-fate · 2024-12-20T08:05:04Z

into two steps:

I guess it is three steps.

Haha, you are right. The AccessPath part is added later and I didn't notice that.
I rephrase that part and it should be correct now.

…MV indexes (#58397) ref #58361

…xMerge OR path with MV index (#58448) ref #58361

ref #58361

time-and-fate · 2024-12-27T08:49:18Z

Split into 4 PRs:
#58332
#58397
#58448
#58396
Now they all merged.

time-and-fate added sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement. labels Dec 17, 2024

time-and-fate self-assigned this Dec 17, 2024

time-and-fate mentioned this issue Dec 17, 2024

planner: improve comments and clean up function for OR type IndexMerge logic #58332

Merged

13 tasks

time-and-fate mentioned this issue Dec 18, 2024

planner: [WIP] #58111

Closed

13 tasks

ti-chi-bot bot pushed a commit that referenced this issue Dec 18, 2024

planner: improve comments and clean up function for OR type IndexMerg…

ec0dabc

…e logic (#58332) ref #58361

This was referenced Dec 18, 2024

planner: unify OR type IndexMerge code paths #58396

Merged

planner: prepare AccessPath.PartialAlternativeIndexPaths to handle MV indexes #58397

Merged

ti-chi-bot bot pushed a commit that referenced this issue Dec 21, 2024

planner: prepare AccessPath.PartialAlternativeIndexPaths to handle …

d0ea9e5

…MV indexes (#58397) ref #58361

time-and-fate mentioned this issue Dec 22, 2024

planner: switch to AccessPath.PartialAlternativeIndexPaths for IndexMerge OR path with MV index #58448

Merged

13 tasks

ti-chi-bot bot pushed a commit that referenced this issue Dec 25, 2024

planner: switch to AccessPath.PartialAlternativeIndexPaths for Inde…

94cf40a

…xMerge OR path with MV index (#58448) ref #58361

ti-chi-bot bot pushed a commit that referenced this issue Dec 27, 2024

planner: unify OR type IndexMerge code paths (#58396)

e44c60c

ref #58361

time-and-fate closed this as completed Dec 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify OR/Union type IndexMerge code paths to expand the applicable scenarios of existing capabilities #58361

Unify OR/Union type IndexMerge code paths to expand the applicable scenarios of existing capabilities #58361

time-and-fate commented Dec 17, 2024 •

edited

Loading

AilinKid commented Dec 18, 2024 •

edited by time-and-fate

Loading

Rustin170506 commented Dec 20, 2024 •

edited

Loading

time-and-fate commented Dec 20, 2024

time-and-fate commented Dec 27, 2024

Unify OR/Union type IndexMerge code paths to expand the applicable scenarios of existing capabilities #58361

Unify OR/Union type IndexMerge code paths to expand the applicable scenarios of existing capabilities #58361

Comments

time-and-fate commented Dec 17, 2024 • edited Loading

Background & Problem

Design

Current implementation

Changes

AilinKid commented Dec 18, 2024 • edited by time-and-fate Loading

Rustin170506 commented Dec 20, 2024 • edited Loading

time-and-fate commented Dec 20, 2024

time-and-fate commented Dec 27, 2024

time-and-fate commented Dec 17, 2024 •

edited

Loading

AilinKid commented Dec 18, 2024 •

edited by time-and-fate

Loading

Rustin170506 commented Dec 20, 2024 •

edited

Loading