Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: support only deserialize necessary rows #9678

Closed

Conversation

Lloyd-Pottiger
Copy link
Contributor

@Lloyd-Pottiger Lloyd-Pottiger commented Nov 28, 2024

What problem does this PR solve?

Issue Number: ref #9699

Problem Summary:

What is changed and how it works?


Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

@ti-chi-bot ti-chi-bot bot added the release-note-none Denotes a PR that doesn't merit a release note. label Nov 28, 2024
Copy link
Contributor

ti-chi-bot bot commented Nov 28, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from lloyd-pottiger, ensuring that each of them provides their approval before proceeding. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Nov 28, 2024
@@ -79,13 +80,50 @@ void DataTypeDecimal<T>::deserializeBinaryBulk(
IColumn & column,
ReadBuffer & istr,
size_t limit,
double /*avg_value_size_hint*/) const
double /*avg_value_size_hint*/,
const IColumn::Filter * filter) const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add unit test about deserializeBinaryBulk(..., filter) to ensure the correctness for DataTypeDecimal/DataTypeEnum/DataTypeNumberBase and DataTypeString

Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
if (block)
{
block.setStartOffset(read_rows);
read_rows += filter.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use read_rows += block.rows() is more reasonable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, block.rows = passed_count < filter.size()

DMFileReaderPool::instance().set(*this, cd.id, start_pack_id, pack_count, column);
// Delete column from local cache since it is not used anymore.
data_sharing_col_data_cache->delColumn(cd.id, next_pack_id);
return column;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to apply column->filter(filter) here?

Comment on lines +237 to +244
Block block = read(&block_filter);
size_t passed_count = countBytesInFilter(block_filter);
for (size_t i = 0; i < block.columns(); ++i)
{
std::vector<size_t> positions;
positions.reserve(passed_count);
for (size_t p = offset; p < offset + rows; ++p)
{
if (filter[p])
positions.push_back(p - offset);
}
for (size_t i = 0; i < block.columns(); ++i)
{
columns[i]->insertDisjunctFrom(*block.getByPosition(i).column, positions);
}
auto col = block.getByPosition(i).column;
// Some columns may only deserialize the passed rows.
if (col->size() != passed_count)
col = col->filter(block_filter, passed_count);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd better ensure all the columns return by read(IColumn::Filter * filter) has the same number of rows. But not handle it in this for-loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have addressed in #9687, since we will rewrite this soon, so just keep it in this PR.

@JaySon-Huang JaySon-Huang mentioned this pull request Dec 6, 2024
12 tasks
@Lloyd-Pottiger Lloyd-Pottiger deleted the read-with-filter-deserial branch December 10, 2024 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants