Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(query): support parallel final aggregator #8577

Merged
merged 14 commits into from
Nov 13, 2022

Conversation

zhang2014
Copy link
Member

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

feat(query): support parallel final aggregator

Fixes #issue

@vercel
Copy link

vercel bot commented Nov 1, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Updated
databend ⬜️ Ignored (Inspect) Nov 13, 2022 at 10:51AM (UTC)

@mergify mergify bot added the pr-feature this PR introduces a new feature to the codebase label Nov 1, 2022
# Conflicts:
#	src/common/hashtable/src/twolevel_hashtable.rs
#	src/common/hashtable/src/unsized_hashtable.rs
#	src/query/datablocks/src/data_block.rs
#	src/query/service/src/pipelines/processors/transforms/aggregator/aggregator_final.rs
#	src/query/service/src/pipelines/processors/transforms/aggregator/aggregator_partial.rs
#	src/query/service/src/pipelines/processors/transforms/group_by/aggregator_polymorphic_keys.rs
#	src/query/service/src/pipelines/processors/transforms/group_by/aggregator_state.rs
@zhang2014 zhang2014 marked this pull request as ready for review November 13, 2022 03:05
# Conflicts:
#	src/query/service/src/pipelines/processors/transforms/aggregator/aggregator_twolevel.rs
src/common/base/Cargo.toml Outdated Show resolved Hide resolved
@Xuanwo
Copy link
Member

Xuanwo commented Nov 13, 2022

ERROR the following files don't have a valid license header: 
src/common/base/src/base/thread_pool.rs
src/common/base/tests/it/thread_pool.rs
src/query/service/src/pipelines/processors/transforms/aggregator/aggregate_info.rs
src/query/service/src/pipelines/processors/transforms/aggregator/aggregator_final_parallel.rs 

@BohuTANG
Copy link
Member

BohuTANG commented Nov 13, 2022

Test results on 32C32G:

All table counts:

+----------+
| count()  |
+----------+
| 64000000 |
+----------+

SQL:
select sum(uid) from log group by did,sid ignore_result;
group by did, sid unique key count:1000000, did and sid data type is VARCHAR.

Main branch(nightly v0.8.109):
mysql> select sum(uid) from log group by did,sid ignore_result;
Empty set (20.88 sec)
Read 64000000 rows, 5.74 GiB in 20.659 sec., 3.1 million rows/sec., 284.73 MiB/sec.

This PR(Parallel final aggregator):
mysql> select sum(uid) from log group by did,sid ignore_result;
Empty set (5.64 sec)
Read 64000000 rows, 5.74 GiB in 5.625 sec., 11.38 million rows/sec., 1.02 GiB/sec.

4X faster! 🚀

@mergify mergify bot merged commit 5340309 into databendlabs:main Nov 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants