Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-2986: Fails the file writing when footer size exceeds int max value #2987

Merged
merged 3 commits into from
Aug 29, 2024

Conversation

ConeyLiu
Copy link
Contributor

@ConeyLiu ConeyLiu commented Aug 13, 2024

Rationale for this change

Closes #2986

What changes are included in this PR?

Prevent force casting long to int, and throw an exception when overflowing.

Are these changes tested?

Are there any user-facing changes?

No

Closes #2986

Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! Have you checked the performance penalty after adding the check?

}
}
}

private int toIntWithCheck(long size) {
private static int toIntWithCheck(long size) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, it looks weird to me that the exception is about Encoding.

if ((int) size != size) {
throw new ParquetEncodingException(
"Cannot write page larger than " + Integer.MAX_VALUE + " bytes: " + size);
}
return (int) size;
}

private static int footerSizeToInt(long size) {
if ((int) size != size) {
throw new ParquetFileWriteException(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about ParquetSizeOverflowException? Perhaps we can also consolidate toIntWithCheck and footerSizeToInt into a single method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to ParquetSizeOverflowException, and removed the footerSizeToInt. The toIntWithCheck is a common method, it would be better to move to the common module. There is already a common utils class Ints, however, it is deprecated. Maybe we can enable it since we can provide more information than Math.toIntExact.

@ConeyLiu
Copy link
Contributor Author

@wgtmac thanks for your review.

Have you checked the performance penalty after adding the check?

I have not checked. Is there any performance util class in this project? I can do it. This should not have some performance regression since the checks are only for each page or file.

/**
* RuntimeException occurs when size overflow.
*/
public class ParquetSizeOverflowException extends ParquetRuntimeException {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put it into the common module since it is a common class.

Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@wgtmac wgtmac merged commit fafd9b0 into apache:master Aug 29, 2024
9 checks passed
@wgtmac wgtmac added this to the 1.15.0 milestone Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prevent writing a file with a footer size exceeding the Int max value
2 participants