Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column prune related improvements #52133

Open
yibin87 opened this issue Mar 27, 2024 · 0 comments
Open

Column prune related improvements #52133

yibin87 opened this issue Mar 27, 2024 · 0 comments
Labels
type/enhancement The issue or PR belongs to an enhancement.

Comments

@yibin87
Copy link
Contributor

yibin87 commented Mar 27, 2024

Enhancement

Column prune will improve performance in several aspects:

  • Reduce memory usage, since fewer column data may be loaded.
  • Reduce network overhead, because we have dependant strorage systems: tikv, tiflash, fewer columns, fewer data will be transfered
  • Reduce CPU usage, some operators' workload are related with the size of their input columns. For example, PhysicalExchange node will partition all its input columns, therefore, fewer columns, fewer workload.

However, there are still several column prune related issues in tidb now:

  1. Useless columns in union operator may not be pruned
    Currently, when a LogicalUnionAll operator's parent operator doesn't use the LogicalUnionAll operator's output columns(e.g. select count(*) from (xx union xx)), these useless columns won't be pruned:
    if !hasBeenUsed {
    parentUsedCols = make([]*expression.Column, len(p.schema.Columns))
    copy(parentUsedCols, p.schema.Columns)
    for i := range used {
    used[i] = true
    }
    }
  2. Useless columns used in filter expressions won't be pruned
    When column prune optimization is executed, filter is still included in DataSource operator. Currently, the DataSource operator's output schema should contain columns that are needed by its parent and used by the filter operator. These columns which are only used by the filter operator, can be pruned.
    For MPP mode, add a new projection above the DataSource operator is enough.
    For Tikv mode, we also need to push down the projection to tikv to reduce network overhead.
  3. MPP physical join operator abandons the column prune achievements
    During the process of building MPP tasks, physical join operaotr's schema is reset to its full semantic output:
    p.schema = BuildPhysicalJoinSchema(p.JoinType, p)

    We'd better keep the column prune achievements, so that MPP Join can improve performance further(Join only construct joined columns that is needed by its parent operator tiflash#8296).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

1 participant