-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PARQUET-2431: Handle ByteBufferAllocator gracefully #1274
Conversation
@shangxinli, if you may have some time, could you check it? |
@@ -729,7 +729,7 @@ private void initDataReader(Encoding dataEncoding, ByteBufferInputStream in, int | |||
|
|||
if (CorruptDeltaByteArrays.requiresSequentialReads(writerVersion, dataEncoding) | |||
&& previousReader != null | |||
&& previousReader instanceof RequiresPreviousReader) { | |||
&& dataColumn instanceof RequiresPreviousReader) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious: how did you catch this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just reading the code :)
This is about handling a very old bug at writing parquet files. I don't think we have too many files with this error out there.
@@ -43,15 +42,15 @@ class MultiBufferInputStream extends ByteBufferInputStream { | |||
private List<ByteBuffer> markBuffers = new ArrayList<>(); | |||
|
|||
MultiBufferInputStream(List<ByteBuffer> buffers) { | |||
this.buffers = buffers; | |||
List<ByteBuffer> buffersCopy = new ArrayList<>(buffers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that we don't need to copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, this is not needed. It was kind of automatic to copy the list instead of using the one got from outside. But the whole point of these classes is performance. So I'll revert this one.
* A wrapper {@link ByteBufferAllocator} implementation that tracks whether all allocated buffers are released. It | ||
* throws the related exception at {@link #close()} if any buffer remains un-released. It also clears the buffers at | ||
* release so if they continued being used it'll generate errors. | ||
* <p>To be used for testing purposes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to print a warn log in case any user uses this in production by accident?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would we differentiate test and prod envs? I would not like to put warns in the test env either.
One would need to explicitly define a ByteBufferAllocator
instance to use it in prod. I would expect reading the specs before using a class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Thank you, @wgtmac! |
Make sure you have checked all steps below.
Jira
them in the PR title. For example, "PARQUET-1234: My Parquet PR"
the ASF 3rd Party License Policy.
Tests
Commits
from "How to write a good git commit message":
Style
mvn spotless:apply -Pvector-plugins
Documentation