Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: support bitmap for u8/16 and i8/16 in approx_distinct #8462

Closed
wants to merge 3 commits into from

Conversation

Weijun-H
Copy link
Member

@Weijun-H Weijun-H commented Dec 7, 2023

Which issue does this PR close?

Follow #1841
Closes #1109

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the physical-expr Physical Expressions label Dec 7, 2023
@Weijun-H Weijun-H changed the title refactor: support bitmap for u8/16 and i8/16 in approx_distinct refactor: support bitmap for u8/16 and i8/16 in approx_distinct Dec 7, 2023
}

fn size(&self) -> usize {
self.bitmap.serialized_size()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure is a proper way to measure roaring bitmap

@korowa
Copy link
Contributor

korowa commented Dec 27, 2023

Thank you @Weijun-H!

This PR, though, looks like the implementation of CountDistinct accumulator, and it doesn't seem that any approximation is performed in the code -- won't it be better to use these changes as default CountDistinct implementation for specified data types?

@alamb
Copy link
Contributor

alamb commented Jan 27, 2024

@korowa
Copy link
Contributor

korowa commented Jan 28, 2024

I wonder what we should do with this PR now we have

While I'm still not sure if this PR fits ApproxDistinct functionality, I suppose it might be a viable replacement for HashSets in regular CountDistinct -- so it at least worth checking / benchmarking within #1823 (paying special attention to memory consumption of bitmap-based accumulator)

@alamb
Copy link
Contributor

alamb commented Jan 29, 2024

Thanks @Weijun-H and @korowa -- I'll mark this PR as a draft now and if someone finds time to do the benchmarks we can reopen it with that in consideration

@alamb alamb marked this pull request as draft January 29, 2024 11:57
Copy link

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale PR has not had any activity for some time label Apr 17, 2024
@github-actions github-actions bot closed this Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
physical-expr Physical Expressions Stale PR has not had any activity for some time
Projects
None yet
Development

Successfully merging this pull request may close these issues.

approx_distinct should be leveraging bitmap for counting u8/16 and i8/16
3 participants