Specialize filter kernel for binary arrays (#2969) #2971

tustvold · 2022-10-28T20:05:06Z

Which issue does this PR close?

Part of #2969

Rationale for this change

Should make filtering byte arrays significantly faster, we actually lacked a specialized implementation for them

What changes are included in this PR?

Are there any user-facing changes?

alamb

🐎 very nice @tustvold

Just to be clear, I think this adds the ability to filter Binary and LargeBinary arrays and we already had specialized kernels for Utf8 and LargeUtf8`

alamb · 2022-11-01T15:54:46Z

arrow-select/src/filter.rs

@@ -626,17 +626,17 @@ where
 ///
 /// Note: NULLs with a non-zero slot length in `array` will have the corresponding
 /// data copied across. This allows handling the null mask separately from the data
-fn filter_string<OffsetSize>(
-    array: &GenericStringArray<OffsetSize>,
+fn filter_bytes<T>(


The comment a few lines above is now out of date:

/// filter implementation for string arrays

Should be:

/// filter implementation for byte arrays

viirya · 2022-11-01T17:00:24Z

arrow-select/src/filter.rs

@@ -626,17 +626,17 @@ where
 ///


Not shown in the diff, but the comment is out-of-dated:

/// `filter` implementation for string arrays

ursabot · 2022-11-01T19:03:26Z

Benchmark runs are scheduled for baseline = c7f97c2 and contender = 62e878e. 62e878e is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Generalize filter byte array (apache#2969)

3c15bd8

github-actions bot added the arrow Changes to the arrow crate label Oct 28, 2022

Fix doc

fed1f4e

tustvold changed the title ~~Generalize filter byte array (#2969)~~ Specialize filter kernel for byte arrays (#2969) Oct 29, 2022

tustvold requested a review from viirya November 1, 2022 10:16

alamb changed the title ~~Specialize filter kernel for byte arrays (#2969)~~ Specialize filter kernel for binary arrays (#2969) Nov 1, 2022

alamb approved these changes Nov 1, 2022

View reviewed changes

viirya approved these changes Nov 1, 2022

View reviewed changes

viirya reviewed Nov 1, 2022

View reviewed changes

Update comment

b7795e8

Dandandan approved these changes Nov 1, 2022

View reviewed changes

tustvold merged commit 62e878e into apache:master Nov 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specialize filter kernel for binary arrays (#2969) #2971

Specialize filter kernel for binary arrays (#2969) #2971

tustvold commented Oct 28, 2022

alamb left a comment

alamb Nov 1, 2022

viirya Nov 1, 2022

ursabot commented Nov 1, 2022

Specialize filter kernel for binary arrays (#2969) #2971

Specialize filter kernel for binary arrays (#2969) #2971

Conversation

tustvold commented Oct 28, 2022

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

alamb left a comment

Choose a reason for hiding this comment

alamb Nov 1, 2022

Choose a reason for hiding this comment

viirya Nov 1, 2022

Choose a reason for hiding this comment

ursabot commented Nov 1, 2022