Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ISSUE-607]Add map ids info for each PartitionLocation to enable filtering for m… #619

Conversation

FMX
Copy link
Contributor

@FMX FMX commented Sep 19, 2022

…ap range reading

[FEATURE]Add map ids info for each PartitionLocation to enable filtering for m…

What changes were proposed in this pull request?

Why are the changes needed?

What are the items that need reviewer attention?

Related issues.

closes #607

Related pull requests.

How was this patch tested?

/cc @related-reviewer

/assign @main-reviewer

@FMX FMX force-pushed the 607-featureoptimize-add-map-ids-info-for-each-partitionlocation-to-enable-filtering-for-map-range-reading branch from fc6c340 to bbe0210 Compare September 19, 2022 11:29
@FMX FMX marked this pull request as ready for review September 19, 2022 12:45
@FMX FMX requested a review from waitinfuture September 19, 2022 13:01
@waitinfuture
Copy link
Contributor

We should add metrics about the size of the RoaringBitmap for very large partitions and post here.

return true;
}
for (int i = startMapIndex; i < endMapIndex; i++) {
hasMapId = hasMapId | location.getMapIdBitMap().contains(i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete hasMapId and rewrite to

if (location.getMapIdBitMap().contains(i)) {
    return true;
}

locationCount);
while (!currentReader.hasNext() && fileIndex < locationCount - 1) {
while (!locationHasMapIdToRead(startMapIndex, endMapIndex, currentLocation)
&& fileIndex < locationCount - 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove fileIndex < locationCount - 1 from the condition

locationCount);
while (!currentReader.hasNext() && fileIndex < locationCount - 1) {
while (!locationHasMapIdToRead(startMapIndex, endMapIndex, currentLocation)
&& fileIndex < locationCount - 1) {
fileIndex++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (fileIndex == locationCount) {
   break;
}

if (currentReader.hasNext()) {
currentChunk = currentReader.next();
fileIndex++;
if (((fileIndex == (locationCount - 1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (fileIndex < locationCount) {
    return currentLocation;
} else {
    return null;
}

if (currentLocation == null) {
return;
}
currentReader = createReader(currentLocation);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why delete the following logic? For normal read we don't know if currentReader has chunks.

while (!currentReader.hasNext()...)

private void moveToNextReader() throws IOException {
if (currentReader != null) {
currentReader.close();
private boolean locationHasMapIdToRead(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to rename to skipLocation, and return true if we are guaranteed that the location has no data to read.

Copy link
Contributor

@waitinfuture waitinfuture left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@waitinfuture waitinfuture merged commit b4654d7 into main Sep 23, 2022
@FMX FMX deleted the 607-featureoptimize-add-map-ids-info-for-each-partitionlocation-to-enable-filtering-for-map-range-reading branch September 23, 2022 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE][OPTIMIZE] Add map ids info for each PartitionLocation to enable filtering for map range reading
2 participants