Support pre-split index regions before creating index #57552

tangenta · 2024-11-20T08:33:58Z

Feature Request

Is your feature request related to a problem? Please describe:

CREATE TABLE `test` (
    `a` bigint NOT NULL,
    `b` bigint NOT NULL,
    `c` bigint DEFAULT NULL,
    PRIMARY KEY (`a`, `b`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4 COLLATE = utf8mb4_bin;

CREATE INDEX `idx` ON `test` (`a`, `c`, `b`);

Right after the "CREATE INDEX" statement was sent, the cluster latency increased significantly, because one TiKV instance is overloaded due to the index write hotspot. The latency was reduced when PD split and scheduled the hotspot regions to other TiKV instances.

The cause of write hotspot could be one of follows:

Sequential Inserts: When new rows are inserted with sequential or monotonically increasing values for the indexed column, such as timestamps or auto-incrementing primary keys.
Skewed Data Distribution: When the data distribution is heavily skewed, causing a disproportionate number of writes to a specific range of index keys.
High Write Frequency: When there is a high frequency of write operations (inserts, updates, deletes) targeting the same index keys or a small range of keys.

Describe the feature you'd like:

Split temp index regions(or index regions in "txn" mode) before the index state becomes "delete-only". Because TiDB doesn't have information about the workload, we have to guess the upper bound and lower bound index values of incoming traffic and split several regions.

A possible extension is introducing the pre-split "index_option" to let users provide more information about the workload.

-- pre-split into 4 regions. The range is calculated automatically.
ALTER TABLE t ADD INDEX idx(col1, col2) PRE_SPLIT_REGIONS=4;
CREATE INDEX idx on t (col1, col2) PRE_SPLIT_REGIONS=4;

-- pre-split into 4 regions and specify lower and upper bound.
ALTER TABLE t ADD INDEX idx(col1, col2) PRE_SPLIT_REGIONS = (BETWEEN ('a', 10) AND ('z', 100) REGIONS 4);

-- pre-split on specified index keys.
ALTER TABLE t ADD INDEX idx(col1, col2) PRE_SPLIT_REGIONS = (BY ('a', 10), ('b', 20), ('c', 30));

PRE_SPLIT_REGION...BETWEEN analogs the behavior of BETWEEN AND clause:

SPLIT TABLE ... INDEX ... BETWEEN (...) AND (...) REGIONS ...;

And PRE_SPLIT_REGION...BY analogs the behavior of SPLIT BY clause:

SPLIT TABLE ... INDEX ... BY (...), (...), ...

TiDB has already supported PRE_SPLIT_REGIONS as a table attribute for CREATE TABLE statements, but there is no attribute similar to PRE_SPLIT_REGIONS...BETWEEN. This is because the timing for users to switch write traffic is under control. Before switching, users can choose to perform SPLIT TABLE. However, this is not the same use case as adding an index.

Describe alternatives you've considered:

Use pd-ctl / TiDB HTTP API, but they don't support splitting a region.
Use SPLIT TABLE, but it doesn't support splitting a non-exists region. We have to block DDL so that the index state can remain at "delete-only".

Teachability, Documentation, Adoption, Migration Strategy:

The text was updated successfully, but these errors were encountered:

…r pre_split index option (#58408) ref #57551, ref #57552

…r pre_split index option (pingcap#58408) ref pingcap#57551, ref pingcap#57552

tangenta added the type/feature-request Categorizes issue or PR as related to a new feature. label Nov 20, 2024

tangenta self-assigned this Nov 20, 2024

tangenta mentioned this issue Nov 20, 2024

ddl: support pre-split index regions before creating index #57553

Merged

13 tasks

ti-chi-bot bot closed this as completed in #57553 Dec 18, 2024

ti-chi-bot bot closed this as completed in 177a03c Dec 18, 2024

tangenta mentioned this issue Dec 19, 2024

parser: support pre-split global index add special comment support for pre_split index option #58408

Merged

13 tasks

ti-chi-bot bot pushed a commit that referenced this issue Dec 23, 2024

parser: support pre-split global index add special comment support fo…

3735ed5

…r pre_split index option (#58408) ref #57551, ref #57552

tangenta mentioned this issue Dec 23, 2024

TiCDC should support restoring pre-split index option pingcap/tiflow#11927

Closed

ti-chi-bot mentioned this issue Dec 26, 2024

ddl: support pre-split index regions before creating index (#57553) #58541

Open

13 tasks

tangenta added a commit to ti-chi-bot/tidb that referenced this issue Dec 30, 2024

parser: support pre-split global index add special comment support fo…

30eba16

…r pre_split index option (pingcap#58408) ref pingcap#57551, ref pingcap#57552

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support pre-split index regions before creating index #57552

Support pre-split index regions before creating index #57552

tangenta commented Nov 20, 2024 •

edited

Loading

Support pre-split index regions before creating index #57552

Support pre-split index regions before creating index #57552

Comments

tangenta commented Nov 20, 2024 • edited Loading

Feature Request

tangenta commented Nov 20, 2024 •

edited

Loading