Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(group_by): support two-level hashmap #5075

Merged
merged 5 commits into from
May 7, 2022

Conversation

fkuner
Copy link
Contributor

@fkuner fkuner commented Apr 28, 2022

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

Two level hashmap

Changelog

  • New Feature
  • Improvement
  • Performance Improvement

Related Issues

Fixes #4602

@vercel
Copy link

vercel bot commented Apr 28, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Updated
databend ⬜️ Ignored (Inspect) May 7, 2022 at 5:00AM (UTC)

@mergify
Copy link
Contributor

mergify bot commented Apr 28, 2022

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

@mergify mergify bot added pr-feature this PR introduces a new feature to the codebase pr-improvement labels Apr 28, 2022
@fkuner
Copy link
Contributor Author

fkuner commented Apr 28, 2022

@sundy-li

@PsiACE PsiACE requested review from zhang2014 and sundy-li April 28, 2022 04:57
@sundy-li sundy-li changed the title feat(group_by): two level hasnmap test feat(group_by): two level hashmap test Apr 28, 2022
@fkuner fkuner force-pushed the approvement-two-level-hashtable branch from e5d8ed6 to bf151db Compare April 28, 2022 14:29
@fkuner fkuner force-pushed the approvement-two-level-hashtable branch from 808f386 to 79d53f5 Compare May 6, 2022 05:26
@fkuner fkuner force-pushed the approvement-two-level-hashtable branch from 79d53f5 to a28cc37 Compare May 6, 2022 05:27
@fkuner
Copy link
Contributor Author

fkuner commented May 6, 2022

/help

@databend-bot
Copy link
Member

/assignme -- assign the issue to you
/review @[username] -- take a reviewer for you
/help -- show help

@fkuner fkuner marked this pull request as ready for review May 6, 2022 05:31
@fkuner fkuner requested a review from BohuTANG as a code owner May 6, 2022 05:31
@fkuner fkuner marked this pull request as draft May 6, 2022 05:34
@fkuner fkuner changed the title feat(group_by): two level hashmap test feat(group_by): support two-level hashmap May 6, 2022
@fkuner fkuner marked this pull request as ready for review May 6, 2022 06:59
@fkuner fkuner marked this pull request as draft May 6, 2022 13:40
@fkuner fkuner force-pushed the approvement-two-level-hashtable branch 2 times, most recently from 575b653 to 627f667 Compare May 7, 2022 02:07
@fkuner fkuner force-pushed the approvement-two-level-hashtable branch from 627f667 to cb5b567 Compare May 7, 2022 03:49
@fkuner fkuner marked this pull request as ready for review May 7, 2022 05:01
@BohuTANG
Copy link
Member

BohuTANG commented May 7, 2022

How about the group by performance gained with this PR?

@fkuner
Copy link
Contributor Author

fkuner commented May 7, 2022

How about the group by performance gained with this PR?

It denpends the cardinal number. In low cardinal situlation, single level map is better than two-level hashmap. In higher cardinal situation, the two-level hashmap is better than single-level hashmap because it decreased the cost of resize which gained abount 6%~8%. But When the cardinal number is higher, the single-level hashmap is better again. This problem happens in clickhouse also when I test the performance of single-level and two-level hashmap.

The pr gains low performance because we only done the first step. We need to open parallel merge in the next step. After do this, the performance will increase 2~3 times. I have discussed this implementation with @sundy-li and @zhang2014. Parallel merge needs to expand the pipeline dynamically which is a little bit hard for me. But we will do it soon because it can
bring obvious benifits really.

cc @BohuTANG @sundy-li @zhang2014

@BohuTANG BohuTANG merged commit 02b7384 into databendlabs:main May 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-take need-review pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

two-level hashmap optimization
5 participants