You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Column prune will improve performance in several aspects:
Reduce memory usage, since fewer column data may be loaded.
Reduce network overhead, because we have dependant strorage systems: tikv, tiflash, fewer columns, fewer data will be transfered
Reduce CPU usage, some operators' workload are related with the size of their input columns. For example, PhysicalExchange node will partition all its input columns, therefore, fewer columns, fewer workload.
However, there are still several column prune related issues in tidb now:
Useless columns in union operator may not be pruned
Currently, when a LogicalUnionAll operator's parent operator doesn't use the LogicalUnionAll operator's output columns(e.g. select count(*) from (xx union xx)), these useless columns won't be pruned:
Useless columns used in filter expressions won't be pruned
When column prune optimization is executed, filter is still included in DataSource operator. Currently, the DataSource operator's output schema should contain columns that are needed by its parent and used by the filter operator. These columns which are only used by the filter operator, can be pruned.
For MPP mode, add a new projection above the DataSource operator is enough.
For Tikv mode, we also need to push down the projection to tikv to reduce network overhead.
MPP physical join operator abandons the column prune achievements
During the process of building MPP tasks, physical join operaotr's schema is reset to its full semantic output:
Enhancement
Column prune will improve performance in several aspects:
However, there are still several column prune related issues in tidb now:
Currently, when a LogicalUnionAll operator's parent operator doesn't use the LogicalUnionAll operator's output columns(e.g. select count(*) from (xx union xx)), these useless columns won't be pruned:
tidb/pkg/planner/core/rule_column_pruning.go
Lines 303 to 309 in b96f081
When column prune optimization is executed, filter is still included in DataSource operator. Currently, the DataSource operator's output schema should contain columns that are needed by its parent and used by the filter operator. These columns which are only used by the filter operator, can be pruned.
For MPP mode, add a new projection above the DataSource operator is enough.
For Tikv mode, we also need to push down the projection to tikv to reduce network overhead.
During the process of building MPP tasks, physical join operaotr's schema is reset to its full semantic output:
tidb/pkg/planner/core/task.go
Line 521 in b96f081
We'd better keep the column prune achievements, so that MPP Join can improve performance further(Join only construct joined columns that is needed by its parent operator tiflash#8296).
The text was updated successfully, but these errors were encountered: