-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move SortKeyCursor
and RowIndex
into modules, add sort_key_cursor
test
#2645
Conversation
use std::cmp::Ordering; | ||
use std::sync::Arc; | ||
|
||
/// A `SortKeyCursor` is created from a `RecordBatch`, and a set of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is moved, and I added stream_id()
and batch_idx()
accessors
@@ -0,0 +1,42 @@ | |||
/// A `RowIndex` identifies a specific row in a logical stream. | |||
/// | |||
/// Each stream is identified by an `stream_idx` and is formed from a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is moved into a new module, and I added some comments / ascii art to help my future self not have to read the code the next time
pub mod sort; | ||
pub mod sort_preserving_merge; | ||
|
||
/// A `SortKeyCursor` is created from a `RecordBatch`, and a set of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is just moved
@@ -0,0 +1,135 @@ | |||
//! Contains tests for SortKeyCursor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are new -- I plan to expand them over time
use datafusion_physical_expr::expressions::col; | ||
|
||
/// Compares [`RowIndex`]es with a vector of strings, the result of | ||
/// pretty formatting the [`RowIndex`]es. This is a macro so errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean unwrap() will give you a stack trace... I'm not a massive fan of macros as they are hard to read, and slow to compile...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to function in 0754ee5
|
||
/// Runs the two cursors to completion, sorting them, and | ||
/// returning the sorted order of rows that would have produced | ||
fn run(cursor1: &mut SortKeyCursor, cursor2: &mut SortKeyCursor) -> Vec<RowIndex> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code doesn't seem to be being used by SortPreservingMerge, is this intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, yes -- I want to unit test SoryKeyCursor
. SortPreservingMerge already has reasonable test coverage in my mind
let mut cursor2 = | ||
SortKeyCursor::new(2, 0, &batch2, &sort_key, Arc::clone(&sort_options)).unwrap(); | ||
|
||
let expected = vec![ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you were feeling fancy you could test stability, i.e. ties are broken by the lower stream idx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, will do
Which issue does this PR close?
re : #2427
Rationale for this change
SortKeyCursor::compare
is the bottleneck for merge performance (details on #2427 (comment))As I prepare to improve the performance of this code (and make it more complicated), I want it more isolated and independent so I can:
What changes are included in this PR?
SortKeyCursor
into its own moduleRowIndex
into its own modulesort_key_cursor
testAre there any user-facing changes?
No
Does this PR break compatibility with Ballista?
no