-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bugfix: do not move src ByteBuffer position for LZ4 length prefixed decompress #12539
bugfix: do not move src ByteBuffer position for LZ4 length prefixed decompress #12539
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #12539 +/- ##
============================================
+ Coverage 61.75% 61.81% +0.05%
+ Complexity 207 198 -9
============================================
Files 2436 2453 +17
Lines 133233 133864 +631
Branches 20636 20764 +128
============================================
+ Hits 82274 82743 +469
- Misses 44911 45006 +95
- Partials 6048 6115 +67
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
This is a good catch. I have a couple of remarks:
|
055aa92
to
2865f63
Compare
Trying to understand the problem better here. Why would it cause problem when the source position is moved? It is allowed for |
I don't think the last bit is always true. While it is allowed to move the position, There's a note in the docs:
I'm also wondering if the the fast decompressor is the best choice. It may be better to move to the safe decompressor which is already used for non v4 indexes - doing that would also address this bug. The
|
@itschrispeck Thanks for digging into it! I feel using safe decompressor is the safe go here, and we can get consistent behavior for v2 and v4. |
+1 on adding a unit test to prevent future regression |
Updated to use safe decompressor. This will move src to I wasn't able to figure out a good way to round trip unit test this. I've added a comment on why the safe decompressor is chosen to hopefully avoid refactoring into the fast decompressor. With more understanding of the LZ4 algorithm I could probably edit some compressed data's bytes to cause this, but given that fast decompressor is deprecated I'd prefer if we can merge this without the test. I did a basic validation of the underlying decompress implementations. In general, perf differences are a wash and vary based on host/input:
|
Addresses #12534
Based on my understanding, calling
_decompressor.decompress(compressedInput, decompressedOutput)
is problematic because it moves the position of the src and dest ByteBuffers. It is possible to move the position of the src buffer beyond the limit of the buffer (see issue for details).The change calls the same underlying method, but only moves the position of the dest buffer. The reason why this was a V4 raw index writer/reader version issue is that this only happens with the length prefixed LZ4 library, and this code path is not used for other raw index writer versions.
Testing: Set up a unit test locally, now able to read a previously unreadable segment when loading/iterating through a problematic column using PinotSegmentRecordReader