Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster BinaryArray to StringArray conversion (~67%) #3168

Merged
merged 4 commits into from
Nov 24, 2022

Conversation

tustvold
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Nov 23, 2022
@tustvold tustvold force-pushed the faster-byte-array-string-conversion branch 3 times, most recently from 9c133c2 to ae3ba7b Compare November 23, 2022 14:31
@tustvold tustvold force-pushed the faster-byte-array-string-conversion branch from ae3ba7b to 5598db0 Compare November 23, 2022 14:32
@tustvold tustvold requested a review from HaoYang670 November 23, 2022 15:33
let builder = v.into_data().into_builder().data_type(Self::DATA_TYPE);
Self::from(builder.build().unwrap())
// SAFETY:
// Validated UTF-8 above
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
How much performance improvement can we get?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

byte_array_to_string_array 20000
                        time:   [13.252 µs 13.254 µs 13.257 µs]
                        change: [-67.094% -67.073% -67.049%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) high mild
  6 (6.00%) high severe

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Faster ByteArray to StringArray conversion

You mean from BinaryArray to StringArray?

@tustvold tustvold changed the title Faster ByteArray to StringArray conversion Faster BinaryArray to StringArray conversion Nov 24, 2022
let builder = v.into_data().into_builder().data_type(Self::DATA_TYPE);
Self::from(builder.build().unwrap())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it basically remove full validation cost?

Comment on lines 224 to 226
for offset in offsets.iter() {
assert!(validated.is_char_boundary(offset.as_usize()))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to exclude last offset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, in case the string is sliced

@tustvold tustvold merged commit 8ba7842 into apache:master Nov 24, 2022
@tustvold tustvold changed the title Faster BinaryArray to StringArray conversion Faster BinaryArray to StringArray conversion (~67%) Nov 24, 2022
@ursabot
Copy link

ursabot commented Nov 24, 2022

Benchmark runs are scheduled for baseline = 1d22fe3 and contender = 8ba7842. 8ba7842 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants