Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddl: support add global index operation on partition table #18402

Merged
merged 23 commits into from
Aug 21, 2020

Conversation

ldeng-ustc
Copy link
Contributor

@ldeng-ustc ldeng-ustc commented Jul 7, 2020

What problem does this PR solve?

Problem Summary:

Support adding global indices. It is a part of #18032 .

What is changed and how it works?

Proposal: #18982

What's Changed:

  • New feature: Support add global index.

When global index is enabled, constraint in partition table index will be removed. If config.EnableGlobalIndex is set, user can add global index by adding an index which value columns do not include all partition columns.

such as:

create table t (a int, b int) partition by range(b) (partition p0 values less than (10));
alter table t add unique idx (a);

Before this PR:

ERROR 1503 (HY000): A UNIQUE INDEX must include all columns in the table's partitioning function

After (set EnableGlobalIndex):

Query OK, 0 rows affected (0.05 sec)

How it Works:

model.IndexInfo was modified to add Global field in paser PR #917.

Decide whether index is global before ddl job is enqueued.

In global index, tableID (i.e. TableInfo.ID) instead of physicalTableID was used to create index prefix, meanwhile, partitionID is add into value layout.

tablecodec.GenIndexValueNew support encode global index value.

Finally, add a testcase to test whether data is correctly writed into kv.

Index Value Layout

Now, we have 6 index value layouts, and use 3 different function to decode it. This not only makes it hard to add new features for index (we should add at least two layouts and one function), but also makes the code and comments less readable and confusing (index.Create has an extreme long comment, and is still getting longer).

In fact, we can just divide the encoding into two layout: new and old. Old encoding value has no specific part, and length always <= 9. On the other hand, new encoding value has >= 10 length, and include some segments for specific features (e.g. restoreData for new collation, common handle, and also, partitionID for global index).

So, I refactor tablecodec.GenIndexValue and tablecodec.DecodeIndexKV for extensibilty, and rewrite comment of index.Create in a more readable way. After that, we can easily add new features to index value: simply add a segment to value.

Here is the new description of index value layout. Note that I just change the decription, so this encoding is exactly same as origin (expect additional global index segment). In other words, there is no compatibility issue.

// Create creates a new entry in the kvIndex data.
// If the index is unique and there is an existing entry with the same key,
// Create will return the existing entry's handle as the first return value, ErrKeyExists as the second return value.
// Value layout:
//		+--New Encoding (with restore data, or common handle, or index is global)
//		|
//		|  Layout: TailLen | Options      | Padding      | [IntHandle] | [UntouchedFlag]
//		|  Length:   1     | len(options) | len(padding) |    8        |     1
//		|
//		|  TailLen:       len(padding) + len(IntHandle) + len(UntouchedFlag)
//		|  Options:       Encode some value for new features, such as common handle, new collations or global index.
//		|                 See below for more information.
//		|  Padding:       Ensure length of value always >= 10. (or >= 11 if UntouchedFlag exists.)
//		|  IntHandle:     Only exists when table use int handles and index is unique.
//		|  UntouchedFlag: Only exists when index is untouched.
//		|
//		|  Layout of Options:
//		|
//		|     Segment:             Common Handle                 |     Global Index      | New Collation
//		|     Layout:  CHandle Flag | CHandle Len | CHandle      | PidFlag | PartitionID | restoreData
//		|     Length:     1         | 2           | len(CHandle) |    1    |    8        | len(restoreData)
//		|
//		|     Common Handle Segment: Exists when unique index used common handles.
//		|     Global Index Segment:  Exists when index is global.
//		|     New Collation Segment: Exists when new collation is used and index contains non-binary string.
//		|
//		+--Old Encoding (without restore data, integer handle, local)
//
//		   Layout: [Handle] | [UntouchedFlag]
//		   Length:   8      |     1
//
//		   Handle:        Only exists in unique index.
//		   UntouchedFlag: Only exists when index is untouched.
//
//		   If neither Handle nor UntouchedFlag exists, value will be one single byte '0' (i.e. []byte{'0'}).
//		   Length of value <= 9, use to distinguish from the new encoding.
//

Related changes

  • PR to update pingcap/parser: #917

Check List

Tests

  • Unit test

Side effects

  • None.

Release note

  • No release note.

@ti-srebot ti-srebot added the contribution This PR is from a community contributor. label Jul 7, 2020
@github-actions github-actions bot added the sig/sql-infra SIG: SQL Infra label Jul 7, 2020
ddl/index.go Outdated Show resolved Hide resolved
@tiancaiamao tiancaiamao changed the title ddl: support add global index ddl: support add global index operation on partition table Jul 9, 2020
ddl/db_test.go Outdated Show resolved Hide resolved
ddl/db_test.go Outdated Show resolved Hide resolved
ddl/db_test.go Show resolved Hide resolved
ddl/db_test.go Outdated Show resolved Hide resolved
ddl/ddl_api.go Outdated Show resolved Hide resolved
ddl/partition.go Show resolved Hide resolved
table/tables/index.go Outdated Show resolved Hide resolved
@@ -971,8 +971,8 @@ func GenIndexKey(sc *stmtctx.StatementContext, tblInfo *model.TableInfo, idxInfo
}

// GenIndexValue creates encoded index value and returns the result
func GenIndexValue(sc *stmtctx.StatementContext, tblInfo *model.TableInfo, idxInfo *model.IndexInfo,
containNonBinaryString bool, distinct bool, untouched bool, indexedValues []types.Datum, h kv.Handle) ([]byte, error) {
func GenIndexValue(sc *stmtctx.StatementContext, tblInfo *model.TableInfo, idxInfo *model.IndexInfo, containNonBinaryString bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are multiple reference of this function in other packages, it's better to introduce a new function such as GenGlobalIndexValue instead of changing the API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API does not change, I just split a long single line into two.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You add a partitionID int64, right?

ddl/db_test.go Outdated
c.Assert(rowValueDatums[cols[1].ID], DeepEquals, types.NewDatum(1))

// check row 2
key, _, err = tablecodec.GenIndexKey(sc, tblInfo, indexInfo, tblInfo.ID, []types.Datum{types.NewDatum(2)}, kv.IntHandle(1), nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check row1 and row2 code are basically the same, you could define a local function to reduce repeat code.

@@ -971,8 +971,8 @@ func GenIndexKey(sc *stmtctx.StatementContext, tblInfo *model.TableInfo, idxInfo
}

// GenIndexValue creates encoded index value and returns the result
func GenIndexValue(sc *stmtctx.StatementContext, tblInfo *model.TableInfo, idxInfo *model.IndexInfo,
containNonBinaryString bool, distinct bool, untouched bool, indexedValues []types.Datum, h kv.Handle) ([]byte, error) {
func GenIndexValue(sc *stmtctx.StatementContext, tblInfo *model.TableInfo, idxInfo *model.IndexInfo, containNonBinaryString bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You add a partitionID int64, right?

@tiancaiamao
Copy link
Contributor

I'm not sure the quite sure the influence of alter table add primary key on the global index encoding.
Rest LGTM

@tiancaiamao
Copy link
Contributor

PTAL @wjhuang2016 @crazycs520

@zimulala zimulala added the require-LGT3 Indicates that the PR requires three LGTM. label Jul 10, 2020
// | | |
// | | | The length >= 19 always because of padding.
// | | |
// ... (Truncated because of length.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why truncated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New layout of local index is just add 8 bytes partitionID at the end, rest part is same as local index . I think 2 examples is enough. If write whole layout here, it seems too long.

@@ -1169,7 +1169,7 @@ func (s *testIntegrationSuite5) TestBackwardCompatibility(c *C) {
TableID: tbl.Meta().ID,
Type: model.ActionAddIndex,
BinlogInfo: &model.HistoryInfo{},
Args: []interface{}{unique, indexName, indexPartSpecifications, indexOption},
Args: []interface{}{unique, false, indexName, indexPartSpecifications, indexOption},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may cause a problem when rolling upgrades. A new TiDB receives old args sent by old TiDB.

Copy link
Contributor Author

@ldeng-ustc ldeng-ustc Jul 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better way to pass one more arg to worker? I have no idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put it in the last arg

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add new field in model.IndexInfo is backward compatible, i.e. a newer TiDB can handle older TiDB.
When the new version find the Global field is missing (by old TiDB), it set the value to false.

A old TiDB receive newer TiDB may be something wrong, because Global may be true and the old server do not know how to handle it, so we must gurantee there is no DDL during rolling update.

Copy link
Member

@wjhuang2016 wjhuang2016 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fire a proposal first. We need to understand and check the whole design before reviewing it.

@coocood
Copy link
Member

coocood commented Jul 10, 2020

Appending a new element to the end of the value is not extensible.
We may need to add more elements in the future.
The better way is to add a partition ID flag, then a partition id, the tail data is unchanged.

@tiancaiamao tiancaiamao removed the require-LGT3 Indicates that the PR requires three LGTM. label Jul 10, 2020
@ldeng-ustc
Copy link
Contributor Author

I write the proposal in a new PR.
PTAL @tiancaiamao @wjhuang2016 @coocood

@lysu
Copy link
Contributor

lysu commented Aug 17, 2020

LGTM

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Aug 17, 2020
@lysu
Copy link
Contributor

lysu commented Aug 17, 2020

maybe we want to support create global in create table stmt in following PR~?

for example:

 create table test_t1 (a int, b int, unique index p_a (a)) partition by hash (b) partitions 3;

indexColumns, err := buildIndexColumns(append(tblInfo.Columns, hiddenCols...), indexPartSpecifications)
if err != nil {
return errors.Trace(err)
}

global := false
if unique && tblInfo.GetPartitionInfo() != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny requirement question:

do we need support non-unique global index?

cc: @tiancaiamao @zz-jason @coocood

Copy link
Contributor

@lysu lysu Aug 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it's useful to support "using a non-unique index without include partition columns", maybe there is a real-world example:

create table order (
  oid_id int,
  user_id  int,
  seller_id int,
  primary key(oid, user_id),
  index uid (user_id),
) partition by hash(user_id) partition 1000

this table will conventional in the "user end side" to let the order-system query by user_id.

but the seller also want query the orders belongs itself in seller system and delivery order, and query by seller_id and it didn't know partition_id(also seller_id isn't unique in order table), and maybe we need a global non-unique index --- index sid (seller_id) to help seller request can be prunning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. In some case global non-unique index is useful. Maybe we should support keywords Local and Global just like Oracle. @zhaox1n had submitted a PR about it, but it is closed now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in this PR.
Support global unique index is complex enough, we should do it step by step.

@tiancaiamao
Copy link
Contributor

LGTM

PTAL @coocood @wjhuang2016

@tiancaiamao tiancaiamao added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Aug 18, 2020
@tiancaiamao
Copy link
Contributor

/merge

@ti-srebot
Copy link
Contributor

@tiancaiamao Oops! This PR requires at least 2 LGTMs to merge. The current number of LGTM is 1.

@tiancaiamao
Copy link
Contributor

LGTM

@tiancaiamao
Copy link
Contributor

/merge

@ti-srebot ti-srebot added the status/can-merge Indicates a PR has been approved by a committer. label Aug 21, 2020
@ti-srebot
Copy link
Contributor

/run-all-tests

@ti-srebot
Copy link
Contributor

@ldeng-ustc merge failed.

@tiancaiamao
Copy link
Contributor

/run-unit-test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/config contribution This PR is from a community contributor. sig/sql-infra SIG: SQL Infra status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants