Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PARQUET-2431: Handle ByteBufferAllocator gracefully #1274

Merged
merged 2 commits into from
Feb 19, 2024

Conversation

gszadovszky
Copy link
Contributor

Make sure you have checked all steps below.

Jira

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines
    from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Style

  • My contribution adheres to the code style guidelines and Spotless passes.
    • To apply the necessary changes, run mvn spotless:apply -Pvector-plugins

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain Javadoc that explain what it does

@gszadovszky gszadovszky requested a review from wgtmac February 13, 2024 12:47
@gszadovszky
Copy link
Contributor Author

@shangxinli, if you may have some time, could you check it?

@@ -729,7 +729,7 @@ private void initDataReader(Encoding dataEncoding, ByteBufferInputStream in, int

if (CorruptDeltaByteArrays.requiresSequentialReads(writerVersion, dataEncoding)
&& previousReader != null
&& previousReader instanceof RequiresPreviousReader) {
&& dataColumn instanceof RequiresPreviousReader) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: how did you catch this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just reading the code :)
This is about handling a very old bug at writing parquet files. I don't think we have too many files with this error out there.

@@ -43,15 +42,15 @@ class MultiBufferInputStream extends ByteBufferInputStream {
private List<ByteBuffer> markBuffers = new ArrayList<>();

MultiBufferInputStream(List<ByteBuffer> buffers) {
this.buffers = buffers;
List<ByteBuffer> buffersCopy = new ArrayList<>(buffers);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we don't need to copy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, this is not needed. It was kind of automatic to copy the list instead of using the one got from outside. But the whole point of these classes is performance. So I'll revert this one.

* A wrapper {@link ByteBufferAllocator} implementation that tracks whether all allocated buffers are released. It
* throws the related exception at {@link #close()} if any buffer remains un-released. It also clears the buffers at
* release so if they continued being used it'll generate errors.
* <p>To be used for testing purposes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to print a warn log in case any user uses this in production by accident?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would we differentiate test and prod envs? I would not like to put warns in the test env either.
One would need to explicitly define a ByteBufferAllocator instance to use it in prod. I would expect reading the specs before using a class.

Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@gszadovszky
Copy link
Contributor Author

Thank you, @wgtmac!

@gszadovszky gszadovszky merged commit d839608 into apache:master Feb 19, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants