-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor: Refactor memory size estimation for HashTable #10748
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see it's good abstraction.
My only concern is, it seems that the original goal was kind of ideal, which just pass the hashtable and get the estimate size. But now before using this function, user have to calculate the fixed_size themself, and also the calculation is not always the same.
Though the abstraction is not pretty enough, I think given the current code this PR is still an improvement. :)
Yes, I totally agree with you on this. However, as discussed in the issue, it still might be worth it due to the improved documentation. |
Agree! I left a comment which may improve but it's not necessary. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much @marvinlanhenke -- this looks like a great step forward in readability 🙏
Thank you @yyy1000 for your review
I left some suggestions / comments. Let me know what you think
@alamb I changed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @marvinlanhenke
Thanks again @marvinlanhenke |
* refactor: extract estimate_memory_size * refactor: cap at usize::MAX * refactor: use estimate_memory_size * chore: add examples * refactor: return Result<usize>; add testcase * fix: docs * fix: remove unneccessary checked_div * fix: remove additional and_then
Which issue does this PR close?
Closes #8764.
Rationale for this change
As stated in the issue here the goal is to consolidate estimation logic into a single function. This allows for better documentation as well as better maintainability.
Also there was an issue with the existing implementation that could still overflow if the
checked_mul
was capped at usize::MAX. I tried to fix this by checking for overflows and returning usize::MAX as a 'max cap'.What changes are included in this PR?
count_distinct/native.rs
andhash_join.rs
Are these changes tested?
Yes.
Are there any user-facing changes?
No.