Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cases of filter-only column scans A filtered column may or may not #2807

Closed
wants to merge 1 commit into from

Conversation

oerling
Copy link
Contributor

@oerling oerling commented Oct 11, 2022

Fix cases of filter-only column scan with is (not) null

A filtered column may or may not produce a value. This is controlled
by the ExtractValue template parameter to visitors. If this is
dwio::common::DropValues, a buffer for retrieving the value is not
allocated.

When common parts of the reader were moved to dwio::common, some of
the DropValues mentions in dwio::dwrf were not renamed. Also, there
was a forward declaration of DropValues in dwio::dwrf, causing the unrenamed locations to
continue to compile.

In Parquet, when reading multipage RowSets, numValuesBeforePage needs
to be the number of passing rows of the RowSet from previous pages,
instead of the number of values, as no values are being collected.

Adds coverage for filter only columns to E2EFilterTest. One in 5
filtered columns will not be projected out. Accordingly, the column
will appear as length 0 in the result. Deals with these in ownership
checking and result comparison.

@netlify
Copy link

netlify bot commented Oct 11, 2022

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 8c16113
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6346ea68922bf4000a9aacde

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 11, 2022
@facebook-github-bot
Copy link
Contributor

@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

A filtered column may or may not produce a value. This is controlled
by the ExtractValue template parameter to visitors. If this is
dwio::common::DropValues, a buffer for retrieving the value is not
allocated.

When common parts of the reader were moved to dwio::common, some of
the DropValues mentions in dwio::dwrf were not renamed. Also, there
was a forward declaration of DropValues in dwio::dwrf, causing the unrenamed locations  to
continue to compile.

In Parquet, when reading multipage RowSets, numValuesBeforePage needs
to be the number of passing rows of the RowSet from previous pages,
instead of the number of values, as no values are being collected.

Adds coverage for filter only columns to E2EFilterTest. One in 5
filtered columns will not be projected out. Accordingly, the column
will appear as length 0 in the result. Deals with these in ownership
checking and result comparison.
@oerling oerling force-pushed the fix-drop-values-pr branch from a1e69c3 to 8c16113 Compare October 12, 2022 16:25
@facebook-github-bot
Copy link
Contributor

@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants