-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support scan filter for ORC decimal reader #11067
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for meta-velox canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also need to add some tests to E2EFilterTest
for decimal types
// Fill decimals before applying filter. | ||
fillDecimals(); | ||
|
||
const auto rawNulls = nullsInReadRange_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract this logic to a static function or trait class so that it can be later reused in parquet as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parquet decimal uses SelectiveIntegerColumnReader::processFilter
so it might don't need to reuse this logic. How do you think? Thanks.
if constexpr (std::is_same_v<DataT, int64_t>) { | ||
processFilter(filter, rows, rawNulls); | ||
} else { | ||
VELOX_UNSUPPORTED("Unsupported filter: {}.", (int)filterKind); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be the case for schema evolution. Just throw VELOX_FAIL
here or VELOX_NYI
for these and add the requested type and file type information to the error message.
@Yuhta I find the E2EFilterTest relies on DWRF writer, but for now bigint-decimal will be written as int64_t, and hugeint-decimal is not supported. Do we need to support them first? velox/velox/dwio/dwrf/writer/ColumnWriter.cpp Lines 2039 to 2041 in 3a1a60a
velox/velox/dwio/dwrf/writer/ColumnWriter.cpp Line 2090 in 3a1a60a
|
@rui-mo Yes it would be ideal to support writing decimals first. Otherwise the tests are very limited and we have no confidence the new code would work correctly. |
2937bac
to
e1b7ef8
Compare
e1b7ef8
to
3117603
Compare
3117603
to
edfb582
Compare
edfb582
to
86a8f95
Compare
86a8f95
to
1a86c34
Compare
1a86c34
to
6750596
Compare
6750596
to
6e1ad03
Compare
const common::Filter* filter, | ||
const RowSet& rows, | ||
const uint64_t* rawNulls) { | ||
returnReaderNulls_ = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add VELOX_CHECK(filter) to assert the function is always called with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the below check, thanks.
VELOX_CHECK_NOT_NULL(filter, "Filter must not be null.");
6e1ad03
to
ed04a8e
Compare
'leafCallToSubfieldFilter' converts decimal filter as Subfield filter, but
'SelectiveDecimalColumnReader::read' rejects scan spec filter. This PR supports
scan filter for ORC decimal reader.