-
Notifications
You must be signed in to change notification settings - Fork 411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: support only deserialize necessary rows #9678
*: support only deserialize necessary rows #9678
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -79,13 +80,50 @@ void DataTypeDecimal<T>::deserializeBinaryBulk( | |||
IColumn & column, | |||
ReadBuffer & istr, | |||
size_t limit, | |||
double /*avg_value_size_hint*/) const | |||
double /*avg_value_size_hint*/, | |||
const IColumn::Filter * filter) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add unit test about deserializeBinaryBulk(..., filter)
to ensure the correctness for DataTypeDecimal
/DataTypeEnum
/DataTypeNumberBase
and DataTypeString
Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
228880d
to
46b4b61
Compare
if (block) | ||
{ | ||
block.setStartOffset(read_rows); | ||
read_rows += filter.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use read_rows += block.rows()
is more reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, block.rows = passed_count < filter.size()
DMFileReaderPool::instance().set(*this, cd.id, start_pack_id, pack_count, column); | ||
// Delete column from local cache since it is not used anymore. | ||
data_sharing_col_data_cache->delColumn(cd.id, next_pack_id); | ||
return column; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to apply column->filter(filter)
here?
Block block = read(&block_filter); | ||
size_t passed_count = countBytesInFilter(block_filter); | ||
for (size_t i = 0; i < block.columns(); ++i) | ||
{ | ||
std::vector<size_t> positions; | ||
positions.reserve(passed_count); | ||
for (size_t p = offset; p < offset + rows; ++p) | ||
{ | ||
if (filter[p]) | ||
positions.push_back(p - offset); | ||
} | ||
for (size_t i = 0; i < block.columns(); ++i) | ||
{ | ||
columns[i]->insertDisjunctFrom(*block.getByPosition(i).column, positions); | ||
} | ||
auto col = block.getByPosition(i).column; | ||
// Some columns may only deserialize the passed rows. | ||
if (col->size() != passed_count) | ||
col = col->filter(block_filter, passed_count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd better ensure all the columns return by read(IColumn::Filter * filter)
has the same number of rows. But not handle it in this for-loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have addressed in #9687, since we will rewrite this soon, so just keep it in this PR.
What problem does this PR solve?
Issue Number: ref #9699
Problem Summary:
What is changed and how it works?
Check List
Tests
Side effects
Documentation
Release note