Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specialized Cursor for StringArray and BinaryArray #5964

Merged
merged 4 commits into from
Apr 12, 2023

Conversation

tustvold
Copy link
Contributor

Which issue does this PR close?

Part of #5882

Rationale for this change

merge sorted utf8 low cardinality
                        time:   [5.0056 ms 5.0200 ms 5.0368 ms]
                        change: [-24.330% -24.082% -23.834%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe

sort merge utf8 low cardinality
                        time:   [5.7681 ms 5.7938 ms 5.8190 ms]
                        change: [-21.867% -21.399% -20.931%] (p = 0.00 < 0.05)
                        Performance has improved.

merge sorted utf8 high cardinality
                        time:   [7.8982 ms 7.9206 ms 7.9468 ms]
                        change: [-14.207% -13.879% -13.546%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

sort merge utf8 high cardinality
                        time:   [9.1147 ms 9.1690 ms 9.2234 ms]
                        change: [-13.848% -13.085% -12.344%] (p = 0.00 < 0.05)
                        Performance has improved.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Apr 11, 2023
type Values = Self;

fn values(&self) -> Self::Values {
self.clone()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally planned to make it so that this deconstructed to the underlying buffers, in order to avoid redundant codegen for strings and binary arrays. Unfortunately this needed apache/arrow-rs#4048

In practice the additional codegen is not likely to matter, we can always revisit if it becomes a problem

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we at least leave a reference to apache/arrow-rs#4048 as a comment with a hint about how to reduce the codegen once apache/arrow-rs#4048 is releaed?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- thank you @tustvold

I fixed clippy on this branch so it should be good to merge

type Values = Self;

fn values(&self) -> Self::Values {
self.clone()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we at least leave a reference to apache/arrow-rs#4048 as a comment with a hint about how to reduce the codegen once apache/arrow-rs#4048 is releaed?

@tustvold tustvold merged commit e17572c into apache:main Apr 12, 2023
korowa pushed a commit to korowa/arrow-datafusion that referenced this pull request Apr 13, 2023
* Generify

* Specialized cursor for StringArray and BinaryArray

* fix clippy

* Review feedback

---------

Co-authored-by: Andrew Lamb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants