Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VarByteChunkForwardIndexReaderV4 fails to decompress some chunks/read values within a chunk #12534

Closed
itschrispeck opened this issue Mar 1, 2024 · 1 comment
Labels

Comments

@itschrispeck
Copy link
Collaborator

itschrispeck commented Mar 1, 2024

Within a segment, certain chunks cannot be decompressed. I can time bound a query within a single segment, and the query will succeed.

e.g. for a segment with data [chunk1][chunk2][chunk3]:

  1. I can write a query that reads chunk1, and it will return valid data
  2. I can write a query that reads chunk3, and it will return valid data
  3. If the query touches data in the middle, we receive a random IllegalArgument/NegativeArraySize exception.

This leads me to believe that this should be unit testable, so I'm working on reproducing this with a sharable dataset.

Sample exceptions:

java.lang.IllegalArgumentException: newPosition > limit: (538976288 > 8324206) at java.base/java.nio.Buffer.createPositionException(Buffer.java:352) at java.base/java.nio.Buffer.position(Buffer.java:327) at java.base/java.nio.ByteBuffer.position(ByteBuffer.java:1551) at java.base/java.nio.MappedByteBuffer.position(MappedByteBuffer.java:328) at java.base/java.nio.MappedByteBuffer.position(MappedByteBuffer.java:73) at org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkForwardIndexReaderV4$UncompressedReaderContext.readSmallUncompressedValue(VarByteChunkForwardIndexReaderV4.java:429) at org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkForwardIndexReaderV4$UncompressedReaderContext.processChunkAndReadFirstValue(VarByteChunkForwardIndexReaderV4.java:413) at org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkForwardIndexReaderV4$ReaderContext.decompressAndRead(VarByteChunkForwardIndexReaderV4.java:372) at org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkForwardIndexReaderV4$ReaderContext.getValue(VarByteChunkForwardIndexReaderV4.java:326) at org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkForwardIndexReaderV4.getString(VarByteChunkForwardIndexReaderV4.java:116) at org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkForwardIndexReaderV4.getString(VarByteChunkForwardIndexReaderV4.java:48) at org.apache.pinot.core.common.DataFetcher$ColumnValueReader.readStringValues(DataFetcher.java:601) at org.apache.pinot.core.common.DataFetcher.fetchStringValues(DataFetcher.java:239) at org.apache.pinot.core.common.DataBlockCache.getStringValuesForSVColumn(DataBlockCache.java:277) at org.apache.pinot.core.operator.docvalsets.ProjectionBlockValSet.getStringValuesSV(ProjectionBlockValSet.java:153) at org.apache.pinot.core.common.RowBasedBlockValueFetcher.createFetcher(RowBasedBlockValueFetcher.java:67) at org.apache.pinot.core.common.RowBasedBlockValueFetcher.<init>(RowBasedBlockValueFetcher.java:33) at org.apache.pinot.core.operator.query.SelectionOnlyOperator.getNextBlock(SelectionOnlyOperator.java:103) at org.apache.pinot.core.operator.query.SelectionOnlyOperator.getNextBlock(SelectionOnlyOperator.java:41) at org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:43) at org.apache.pinot.core.operator.combine.BaseSingleBlockCombineOperator.processSegments(BaseSingleBlockCombineOperator.java:94) at org.apache.pinot.core.operator.combine.BaseCombineOperator$1.runJob(BaseCombineOperator.java:117) at org.apache.pinot.core.util.trace.TraceRunnable.run(TraceRunnable.java:40) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1583) 
net.jpountz.lz4.LZ4Exception: Error decoding offset 1324540 of input buffer at net.jpountz.lz4.LZ4JNIFastDecompressor.decompress(LZ4JNIFastDecompressor.java:72) at net.jpountz.lz4.LZ4DecompressorWithLength.decompress(LZ4DecompressorWithLength.java:222) at org.apache.pinot.segment.local.io.compression.LZ4WithLengthDecompressor.decompress(LZ4WithLengthDecompressor.java:44) at org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkForwardIndexReaderV4$CompressedReaderContext.processChunkAndReadFirstValue(VarByteChunkForwardIndexReaderV4.java:460) at org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkForwardIndexReaderV4$ReaderContext.decompressAndRead(VarByteChunkForwardIndexReaderV4.java:372) at org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkForwardIndexReaderV4$ReaderContext.getValue(VarByteChunkForwardIndexReaderV4.java:326) at org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkForwardIndexReaderV4.getString(VarByteChunkForwardIndexReaderV4.java:116) at org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkForwardIndexReaderV4.getString(VarByteChunkForwardIndexReaderV4.java:48) at org.apache.pinot.core.common.DataFetcher$ColumnValueReader.readStringValues(DataFetcher.java:601) at org.apache.pinot.core.common.DataFetcher.fetchStringValues(DataFetcher.java:239) at org.apache.pinot.core.common.DataBlockCache.getStringValuesForSVColumn(DataBlockCache.java:277)
@itschrispeck
Copy link
Collaborator Author

itschrispeck commented Mar 1, 2024

@richardstartin @saurabhd336 for your thoughts

@itschrispeck itschrispeck changed the title VarByteChunkForwardIndexReaderV4 fails to decompress some chunks VarByteChunkForwardIndexReaderV4 fails to decompress some chunks/read values within a chunk Mar 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants