perf: Add fast paths for series.arg_sort and dataframe.sort #19872

siddharth-vi · 2024-11-19T18:28:05Z

Add fast paths for series.arg_sort and dataframe.sort(single column) in the following cases -

Key column has no nulls and is sorted in the required order
Key column has no nulls and is sorted in the opposite order
Key column has nulls ,is sorted in the required order and has nulls in the correct place

For the second case, if maintain_order is false, technically we can simply return the reverse of 0..len. However current behavior is that we still do a stable sort on it, maintaining order of elements. This is also the expectation in some unrelated test cases. To maintain this behavior, I have implemented a linear algorithm to do a stable reverse sort. This is faster than the current implementation which is also linear for sorting already sorted data, but takes more time due to creating a copy of the data.

Closes #19364

Benchmarks

dataframe.sort

Code

import numpy as np
import polars as pl
n = 1_000_000
df = pl.DataFrame({
    "a": np.random.rand(n),
    "b": np.random.rand(n),
})

# -- One column
print("A unsorted")
df2 = df.select("a")
%timeit df2.sort("a")

print("A sorted")
df2 = df.select("a").sort("a")
%timeit df2.sort("a")

# -- Two columns
print("A unsorted, B unsorted")
df2 = df.clone()
%timeit df2.sort("a")

print("A sorted, B unsorted")
df2 = df.sort("a")
%timeit df2.sort("a")

print("A sorted reverse, B unsorted")
df2 = df.sort("a",descending=True)
%timeit df2.sort("a")


df_nulls = pl.DataFrame({
    "a": np.random.rand(n).tolist()+[None],
    "b": np.random.rand(n).tolist()+[None],
})
print("A sorted, with null")
df2_nulls = df_nulls.sort("a")
%timeit df2_nulls.sort("a")

case	Time taken Before	Time taken After
A unsorted	32.7 ms ± 651 μs	32 ms ± 1.33 ms
A sorted	11.6 ms ± 52.6 μs	6.08 μs ± 24.2 ns
A sorted reverse	17.6 ms ± 1.43 ms	4.75 ms ± 244 μs
A sorted, with null	14.1 ms ± 560 μs	6.14 μs ± 43.5 ns

series.arg_sort

Code

import numpy as np
#import polars as pl
n = 1_000_000
df = pl.DataFrame({
    "a": np.random.rand(n),
    "b": np.random.rand(n),
})
s=df["a"]
%timeit s.arg_sort()

s=df.sort("a")["a"]
%timeit s.arg_sort()
%timeit s.arg_sort(descending=True)
df_nulls = pl.DataFrame({
    "a": np.random.rand(n).tolist()+[None],
    "b": np.random.rand(n).tolist()+[None],
})
s_nulls=df_nulls.sort("a")["a"]
%timeit s_nulls.arg_sort()

case	Time taken Before	Time taken After
A unsorted	27.2 ms ± 715 μs	27.5 ms ± 472 μs
A sorted	9.85 ms ± 51.2 μs	71.2 μs ± 362 ns
A sorted reverse	12.7 ms ± 298 μs	1.99 ms ± 9.96 μs
A sorted, with null	11 ms ± 67.7 μs	69.8 μs ± 744 ns

Testing

We need to ensure that even if we take fast path we do not change the final array. I have modified some pre existing tests to also test for fast path. I have added an additional test which checks for correctness of sorting.

codecov · 2024-11-23T19:43:18Z

Codecov Report

Attention: Patch coverage is 89.78495% with 19 lines in your changes missing coverage. Please review.

Project coverage is 79.61%. Comparing base (c92612a) to head (c38636c).
Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
...polars-core/src/chunked_array/ops/sort/arg_sort.rs	86.52%	19 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #19872      +/-   ##
==========================================
- Coverage   79.62%   79.61%   -0.01%     
==========================================
  Files        1564     1564              
  Lines      217989   218171     +182     
  Branches     2477     2477              
==========================================
+ Hits       173564   173704     +140     
- Misses      43857    43899      +42     
  Partials      568      568

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

siddharth-vi · 2024-11-23T21:55:56Z

crates/polars-core/src/frame/mod.rs

+        #[allow(non_upper_case_globals)]
+        const is_not_categorical_enum: bool = true;
+
+        if by_column.len() == 1 && is_not_categorical_enum {


Disabled fast path for categorical columns due to #19900

crates/polars-core/src/chunked_array/ops/sort/mod.rs

ritchie46

Thanks, I have left some comments.

We should split the bug fix PR from the optimizations, they should not be in a single PR.

crates/polars-core/src/chunked_array/ops/sort/arg_sort.rs

ritchie46 · 2024-11-24T12:48:51Z

crates/polars-core/src/chunked_array/ops/sort/arg_sort.rs

+    // 2) If array is reverse sorted -> we do a stable reverse.
+    if is_sorted_flag != IsSorted::Not {
+        let len_final = if let Some((limit, _desc)) = options.limit {
+            let limit = limit as usize;


I think we can simplify here somewhat after we replace with std::cmp::min

Have rewritten this, let me know if it looks fine now.

ritchie46 · 2024-11-24T12:51:03Z

crates/polars-core/src/chunked_array/ops/sort/arg_sort.rs

+{
+    let mut current_start: IdxSize = 0;
+    let mut current_end: IdxSize = 1;
+    let mut flattened_iter = iters.into_iter().flatten();


We need to use iter as into_iter boxes.

I also don't like flattened iterators. If we can write this with explicitly looping over the chunks we should prefer that.

Have rewritten this, let me know if it looks fine now.

ritchie46 · 2024-11-24T12:51:38Z

crates/polars-core/src/chunked_array/ops/sort/arg_sort.rs

+            rev_idx.reverse();
+            rev_idx
+        },
+        None => rev_idx,


What does this return? An empty Vec?

Yes, for arrays of length zero we return an empty vec (From Vec::with_capacity(0)).

siddharth-vi · 2024-11-26T14:05:38Z

Have created a separate PR for the bug fix- #20004

siddharth-vi · 2024-12-06T11:14:32Z

@ritchie46 I have made the requested changes, please have a look .

ritchie46 · 2024-12-07T08:56:31Z

Thanks a lot @siddharth-vi. Great improvements. Left one comment, but I think we can do that in a later PR.

ritchie46 · 2024-12-07T08:48:01Z

crates/polars-core/src/chunked_array/ops/sort/mod.rs

@@ -159,6 +159,33 @@ macro_rules! sort_with_fast_path {
    }}
 }

+macro_rules! arg_sort_fast_path {


No blocker for this PR, but I rather see this in a generic function. Could be a follow up.

siddharth-vi requested review from ritchie46, c-peters, alexander-beedie, MarcoGorelli, reswqa and orlp as code owners November 19, 2024 18:28

github-actions bot added performance Performance issues or improvements python Related to Python Polars rust Related to Rust Polars labels Nov 19, 2024

siddharth-vi marked this pull request as draft November 19, 2024 21:52

siddharth-vi marked this pull request as ready for review November 23, 2024 21:54

siddharth-vi commented Nov 23, 2024

View reviewed changes

crates/polars-core/src/chunked_array/ops/sort/mod.rs Outdated Show resolved Hide resolved

ritchie46 requested changes Nov 24, 2024

View reviewed changes

siddharth-vi changed the title ~~perf: Add fast paths for series.arg_sort and dataframe.sort + bug fix in existing fast path~~ perf: Add fast paths for series.arg_sort and dataframe.sort Nov 26, 2024

siddharth-vi requested a review from ritchie46 November 26, 2024 13:44

siddharth-vi mentioned this pull request Nov 26, 2024

fix: Bug fix in existing fast path for sorted series #20004

Merged

siddharth-vi force-pushed the arg_sort_fast_path2 branch from 9a52e39 to c9cc500 Compare December 6, 2024 13:43

siddharthv and others added 9 commits December 7, 2024 09:54

First commit

edb956d

commit

d0b059c

Add tests

40e07fa

commit

f259378

Fix clippy error

e59e9e4

limit

0c97c53

limit

0484596

changes

61ec922

rebase changes

ee8c2a4

siddharth-vi and others added 3 commits December 7, 2024 09:54

Remove print

f5191ce

rebase fix + fmt

e99cbda

remove double alloc

c38636c

ritchie46 force-pushed the arg_sort_fast_path2 branch from b6827d2 to c38636c Compare December 7, 2024 08:54

ritchie46 approved these changes Dec 7, 2024

View reviewed changes

ritchie46 merged commit a6ca94d into pola-rs:main Dec 7, 2024
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Add fast paths for series.arg_sort and dataframe.sort #19872

perf: Add fast paths for series.arg_sort and dataframe.sort #19872

siddharth-vi commented Nov 19, 2024 •

edited

Loading

codecov bot commented Nov 23, 2024 •

edited

Loading

siddharth-vi Nov 23, 2024

ritchie46 left a comment

ritchie46 Nov 24, 2024

siddharth-vi Nov 26, 2024

ritchie46 Nov 24, 2024

siddharth-vi Nov 26, 2024

ritchie46 Nov 24, 2024

siddharth-vi Nov 26, 2024

siddharth-vi commented Nov 26, 2024

siddharth-vi commented Dec 6, 2024

ritchie46 commented Dec 7, 2024

ritchie46 Dec 7, 2024

perf: Add fast paths for series.arg_sort and dataframe.sort #19872

perf: Add fast paths for series.arg_sort and dataframe.sort #19872

Conversation

siddharth-vi commented Nov 19, 2024 • edited Loading

Benchmarks

Testing

codecov bot commented Nov 23, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

ritchie46 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

siddharth-vi commented Nov 26, 2024

siddharth-vi commented Dec 6, 2024

ritchie46 commented Dec 7, 2024

Choose a reason for hiding this comment

siddharth-vi commented Nov 19, 2024 •

edited

Loading

codecov bot commented Nov 23, 2024 •

edited

Loading