Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use interleave in BatchBuilder (~10% faster merge) #5894

Merged
merged 3 commits into from
Apr 7, 2023

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Apr 6, 2023

Which issue does this PR close?

Closes #.

Rationale for this change

Simplifies BatchBuilder by using the upstream interleave kernel.

As an added bonus this yields some non-trivial performance speed ups.

merge sorted i64        time:   [8.5328 ms 8.5418 ms 8.5511 ms]
                        change: [-10.592% -10.481% -10.361%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

sort merge i64          time:   [8.7169 ms 8.7280 ms 8.7391 ms]
                        change: [-10.569% -10.429% -10.284%] (p = 0.00 < 0.05)
                        Performance has improved.

merge sorted f64        time:   [8.4695 ms 8.4762 ms 8.4838 ms]
                        change: [-10.751% -10.641% -10.524%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) high mild
  1 (1.00%) high severe

sort merge f64          time:   [8.6876 ms 8.6986 ms 8.7099 ms]
                        change: [-10.503% -10.350% -10.183%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

merge sorted utf8 low cardinality
                        time:   [7.2147 ms 7.2184 ms 7.2224 ms]
                        change: [-1.0837% -0.9800% -0.8862%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

sort merge utf8 low cardinality
                        time:   [7.8602 ms 7.8836 ms 7.9077 ms]
                        change: [-1.1147% -0.7057% -0.2931%] (p = 0.00 < 0.05)
                        Change within noise threshold.

merge sorted utf8 high cardinality
                        time:   [9.5599 ms 9.5747 ms 9.5925 ms]
                        change: [-11.571% -11.374% -11.192%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

sort merge utf8 high cardinality
                        time:   [10.700 ms 10.753 ms 10.807 ms]
                        change: [-11.555% -10.940% -10.308%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

merge sorted utf8 tuple time:   [15.013 ms 15.029 ms 15.048 ms]
                        change: [-16.753% -16.600% -16.449%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

sort merge utf8 tuple   time:   [18.151 ms 18.260 ms 18.367 ms]
                        change: [-15.472% -14.825% -14.148%] (p = 0.00 < 0.05)
                        Performance has improved.

merge sorted utf8 dictionary
                        time:   [6.1689 ms 6.1744 ms 6.1818 ms]
                        change: [-6.2663% -6.1546% -6.0351%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

sort merge utf8 dictionary
                        time:   [6.3495 ms 6.3580 ms 6.3665 ms]
                        change: [-6.1132% -5.9187% -5.7376%] (p = 0.00 < 0.05)
                        Performance has improved.

merge sorted utf8 dictionary tuple
                        time:   [8.6838 ms 8.6894 ms 8.6951 ms]
                        change: [-6.4096% -6.3163% -6.2247%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

sort merge utf8 dictionary tuple
                        time:   [9.9218 ms 10.006 ms 10.090 ms]
                        change: [-6.6577% -5.5463% -4.4579%] (p = 0.00 < 0.05)
                        Performance has improved.

merge sorted mixed dictionary tuple
                        time:   [14.137 ms 14.150 ms 14.164 ms]
                        change: [-7.6257% -7.5182% -7.4068%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

sort merge mixed dictionary tuple
                        time:   [15.553 ms 15.635 ms 15.717 ms]
                        change: [-7.4586% -6.7567% -6.0622%] (p = 0.00 < 0.05)
                        Performance has improved.

merge sorted mixed tuple
                        time:   [14.151 ms 14.164 ms 14.180 ms]
                        change: [-19.019% -18.906% -18.786%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

sort merge mixed tuple  time:   [15.717 ms 15.762 ms 15.806 ms]
                        change: [-19.333% -18.977% -18.625%] (p = 0.00 < 0.05)
                        Performance has improved.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Apr 6, 2023
@alamb alamb mentioned this pull request Apr 6, 2023
6 tasks
@tustvold tustvold force-pushed the batch-builder-interleave branch from 408ee93 to 308cf3f Compare April 7, 2023 09:34
@tustvold tustvold marked this pull request as ready for review April 7, 2023 09:35
// emit final batch of rows
array_data.extend(buffer_idx, start_row_idx, end_row_idx);
make_array(array_data.freeze())
Ok(interleave(&arrays, &self.indices)?)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me -- thank you @tustvold

cc @jaylmiller

I also took the liberty of fixing clippy and merging from main

@tustvold tustvold mentioned this pull request Apr 7, 2023
@tustvold tustvold merged commit 2a2e147 into apache:main Apr 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants