-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor I/O exec nodes #136
Conversation
2b52d6d
to
d5c56ad
Compare
@@ -46,6 +46,24 @@ TEST_CASE("Test Write Int32 array") { | |||
} | |||
} | |||
|
|||
TEST_CASE("Read not full batch") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also need to test that it's not scanning the whole column and returning 6-8? or is that already tested elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. This one is tested on the decoder, not the "columnar reader“ / ExecNode
. It might need mock arrow::RandomAccessFile to verify. Should we do it in a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i’m ok with that. Can you make a github issue for that then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #138
cpp/src/lance/io/exec/project.cc
Outdated
ARROW_ASSIGN_OR_RAISE(auto scan_schema, schema.Project(scan_options->filter)); | ||
ARROW_ASSIGN_OR_RAISE(auto scan_node, | ||
Scan::Make(reader, scan_schema, scan_options->batch_size)); | ||
ARROW_ASSIGN_OR_RAISE(child, Filter::Make(scan_options->filter, std::move(scan_node))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to use different variable names here instead of reassigning to child? reads a bit confusing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one, child
is like the head of a linked list. The reason writing in this way is that, there is an optional Limit in between Take and Filter.
Otherwise, the code might look like
filter_node.= Filter::Make()
if (limit) {
limit_node = Limit::Make(filter_node);
take_node = Take::Make(limit_node);
} else {
take_node = Take::Make(filter_node);
}
Any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i kind of feel like that makes the logic clearer because it’s more explicit and the cost is only one extra line. I don’t feel super strongly about it so it’s not a blocker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, i can update this. I lean to think the current way is more mathmatically general, i.e., ”inserting one ExecNode to the head of chain", instead of making each ::Make
a special case tho.
But I am fine with either way.
cpp/src/lance/io/exec/project.cc
Outdated
/// Build a chain of: | ||
/// | ||
/// Take -> (optionally) Limit -> Filter -> Scan | ||
ARROW_ASSIGN_OR_RAISE(auto scan_schema, schema.Project(scan_options->filter)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's call this a filter_scan
instead of just a plain scan
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean filter_scan_schema
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, filter_scan_schema
instead of scan_schema
and was there also a filter_scan_node
instead of scan_node
? i forgot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
fbd1013
to
a8513c2
Compare
Rebased to main. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple of nits remaining
Take
node that do random reads based on the indices returned from theFilter
node.