-
Description
MySQL supports invisible indexes; that is, indexes that are not used by the optimizer. This is a useful feature when you want to drop an index in a safe way. Invisible indexes make it possible to test the effect of removing an index on query performance, without making a destructive change that must be undone where the index turns out to be required. Dropping and re-adding an index can be expensive for a large table, whereas making it invisible and visible are fast, in-place operations.
-
Recommended Skills: Golang, SQL optimizer, TiDB DDL
-
Issue: pingcap/tidb#9246
-
Description
The current MVMap used in TiDB is not efficient for hash join or hash aggregate.
Implement an efficient hash table, which should:
- Have a better probe performance
- Have a not bad fill factor
- Provide efficient hash functions for all data types that TiDB has
For example, Horton Tables: Fast Hash Tables for In-Memory Data-Intensive Computing.
-
Recommended Skills: Golang, SQL execution engine
-
Issue: pingcap/tidb#9247
-
Description
In TiDB, the current expression evaluation interface is row-oriented. It is not efficient. To evaluate a batch of inputs, 1024 rows for example, 1024 function calls are invoked.
We could expand the expression evaluation interface to add a column-oriented method, evaluate the builtin functions column by column. This will improve the performance a lot.
-
Recommended Skills: Golang, SQL execution engine
-
Issue: pingcap/tidb#9245
-
Description
Most aggregation queries contain both group-by and join operators, and spend a significant amount of time evaluating these two expensive operators. Merging them into one operator (the groupjoin) significantly speeds up query execution. Some TPC-H queries can also benefit from the
groupjoin
operator. For example, Q13, Q3, Q21. See Accelerating Queries with Group-By and Join by Groupjoin.pdf, VLDB, 2011. -
Recommended Skills: Golang, SQL optimizer, SQL execution engine
-
Issue: pingcap/tidb#7469
-
Description
Group pruning is a transformation rule, which is always beneficial. It removes the groups not needed in the outer query blocks. For example, consider the following query:
TiDB(localhost:4000) > desc select max(b), sum(sum_c) from (select a, b, sum(c) as sum_c from t group by a, b) tmp group by a; +--------------------------+----------+------+------------------------------------------------------------------------------------------+ | id | count | task | operator info | +--------------------------+----------+------+------------------------------------------------------------------------------------------+ | HashAgg_9 | 8000.00 | root | group by:test.t.a, funcs:max(test.t.b), sum(sum_c) | | └─HashAgg_14 | 8000.00 | root | group by:col_3, col_4, funcs:sum(col_0), firstrow(col_1), firstrow(col_2) | | └─TableReader_15 | 8000.00 | root | data:HashAgg_10 | | └─HashAgg_10 | 8000.00 | cop | group by:test.t.a, test.t.b, funcs:sum(test.t.c), firstrow(test.t.a), firstrow(test.t.b) | | └─TableScan_13 | 10000.00 | cop | table:t, range:[-inf,+inf], keep order:false, stats:pseudo | +--------------------------+----------+------+------------------------------------------------------------------------------------------+ 5 rows in set (0.01 sec)
Can be transformed into something like this:
TiDB(localhost:4000) > desc select max(b), sum(c) from t group by a; +-----------------------+----------+------+------------------------------------------------------------+ | id | count | task | operator info | +-----------------------+----------+------+------------------------------------------------------------+ | HashAgg_9 | 8000.00 | root | group by:col_2, funcs:max(col_0), sum(col_1) | | └─TableReader_10 | 8000.00 | root | data:HashAgg_5 | | └─HashAgg_5 | 8000.00 | cop | group by:test.t.a, funcs:max(test.t.b), sum(test.t.c) | | └─TableScan_8 | 10000.00 | cop | table:t, range:[-inf,+inf], keep order:false, stats:pseudo | +-----------------------+----------+------+------------------------------------------------------------+ 4 rows in set (0.00 sec)
-
Recommended Skills: Golang, SQL optimizer
-
Issue: pingcap/tidb#7700
-
Description
Now we deploy and operate DM using Ansible, we have to do a lot of things manually. For example,
-
To add a DM-worker instance for a new MySQL instance, we have to edit
Ansible inventory
and run anansible-playbook
. If not so lucky, we may need to stop the task, edit the task configuration file, and then start some tasks to let the new DM-worker instance join. -
DM-master and DM-worker don't store tasks, thus we have to start the whole task after some DM-workers restart.
We can think of a vision - the user only operates the task configuration, maybe by web or a command line tool, and everything else is automated.
Project example: TiDB-operator
-
-
Recommended Skill: Golang
-
Issue: pingcap/dm#43
-
Description
Now what information does DM have to show to users?
-
dmctl
query-status
: queries basic information of the task, including some complex and unclear error messagesshow-ddl-lock
....query-error
...
-
Grafana: many nonsense monitor graphs
What are the disadvantages of the above methods? Lack of contextual information leads to incomprehensible or inferential problems.
We need a way to show the system or task running status details in a natural way, like a straightforward way to show the speed of data flow, key events and where to happen.
-
-
Recommended Skills: Golang, OpenTracing
-
Issue: pingcap/dm#44