Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test native parquet reader query #2843

Closed

Conversation

jinchengchenghh
Copy link
Contributor

@jinchengchenghh jinchengchenghh commented Oct 14, 2022

Unit test about #2821

root@sr249:/mnt/DP_disk1/code/velox/_build/debug/velox/dwio/parquet/tests/reader# ./velox_dwio_parquet_query_test
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from ParquetQueryTest
[ RUN      ] ParquetQueryTest.simpleSelectFilter
*** Aborted at 1665746122 (Unix time, try 'date -d @1665746122') ***
*** Signal 8 (SIGFPE) (0x556f0082cfcd) received by PID 2387964 (pthread TID 0x7fd9e299e700) (linux TID 2388060) (code: integer divide by zero), stack trace: ***
./velox_dwio_parquet_query_test(_ZN5folly10symbolizer17getStackTraceSafeEPmm+0x80)[0x556f04918667]
./velox_dwio_parquet_query_test(_ZN5folly10symbolizer21SafeStackTracePrinter15printStackTraceEb+0x6a)[0x556f049fe9be]
./velox_dwio_parquet_query_test(+0xb1b9ea9)[0x556f049fcea9]
./velox_dwio_parquet_query_test(+0xb1b9f8a)[0x556f049fcf8a]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x1441f)[0x7fda1472f41f]
./velox_dwio_parquet_query_test(_ZN8facebook5velox4dwio6common10IntDecoderILb0EE12decodeBitsLEIiEEvPKmiN5folly5RangeIPKiEEihPKcPT_+0x229)[0x556f0082cfcd]
./velox_dwio_parquet_query_test(_ZN8facebook5velox7parquet10RleDecoderILb0EE10processRunILb1ELb0ELb1ENS0_4dwio6common29StringDictionaryColumnVisitorINS0_6common11BytesValuesENS6_15ExtractToReaderINS1_18StringColumnReaderEEELb1EEEEEvPKiiiiSF_PiPNT2_8DataTypeERiRSH_+0x12f)[0x556f000e541b]
./velox_dwio_parquet_query_test(_ZN8facebook5velox7parquet10RleDecoderILb0EE8bulkScanILb1ELb0ELb1ENS0_4dwio6common29StringDictionaryColumnVisitorINS0_6common11BytesValuesENS6_15ExtractToReaderINS1_18StringColumnReaderEEELb1EEEEEvN5folly5RangeIPKiEESH_RT2_+0x1f5)[0x556f000d5d1d]
./velox_dwio_parquet_query_test(_ZN8facebook5velox7parquet10RleDecoderILb0EE8fastPathILb1ENS0_4dwio6common29StringDictionaryColumnVisitorINS0_6common11BytesValuesENS6_15ExtractToReaderINS1_18StringColumnReaderEEELb1EEEEEvPKmRT0_+0x106)[0x556f000ca146]
./velox_dwio_parquet_query_test(_ZN8facebook5velox7parquet10RleDecoderILb0EE15readWithVisitorILb1ENS0_4dwio6common29StringDictionaryColumnVisitorINS0_6common11BytesValuesENS6_15ExtractToReaderINS1_18StringColumnReaderEEELb1EEEEEvPKmT0_+0x56)[0x556f000a78b8]
./velox_dwio_parquet_query_test(_ZN8facebook5velox7parquet10PageReader11callDecoderINS0_4dwio6common13ColumnVisitorIN5folly5RangeIPKcEENS0_6common11BytesValuesENS5_15ExtractToReaderINS1_18StringColumnReaderEEELb1EEELi0EEEvPKmRbT_+0xc9)[0x556f0009ce7b]
./velox_dwio_parquet_query_test(_ZN8facebook5velox7parquet10PageReader15readWithVisitorINS0_4dwio6common13ColumnVisitorIN5folly5RangeIPKcEENS0_6common11BytesValuesENS5_15ExtractToReaderINS1_18StringColumnReaderEEELb1EEEEEvRT_+0x164)[0x556f000948b8]
./velox_dwio_parquet_query_test(_ZN8facebook5velox7parquet11ParquetData15readWithVisitorINS0_4dwio6common13ColumnVisitorIN5folly5RangeIPKcEENS0_6common11BytesValuesENS5_15ExtractToReaderINS1_18StringColumnReaderEEELb1EEEEEvT_+0x2b)[0x556f0009152b]
./velox_dwio_parquet_query_test(_ZN8facebook5velox7parquet18StringColumnReader10readHelperINS0_6common11BytesValuesELb1ENS0_4dwio6common15ExtractToReaderIS2_EEEEvPNS4_6FilterEN5folly5RangeIPKiEET1_+0x8a)[0x556f0008e8a8]

#0  0x0000556f0082cfcd in facebook::velox::dwio::common::IntDecoder<false>::decodeBitsLE<int> (bits=0x7fd9dc017378, bitOffset=0,
    rows=..., rowBias=0, bitWidth=0 '\000',
    bufferEnd=0x7fd9dc017378 '\241' <repeats 24 times>, "\255\336\335ں\335ں\020s\001\334\331\177", result=0x7fd9dc016fd0)
    at /mnt/DP_disk1/code/velox/velox/dwio/common/IntDecoder.cpp:2602
#1  0x0000556f000e541c in facebook::velox::parquet::RleDecoder<false>::processRun<true, false, true, facebook::velox::dwio::common::StringDictionaryColumnVisitor<facebook::velox::common::BytesValues, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader>, true> > (this=0x7fd9dc0174e0, rows=0x7fd9dc01bf80, rowIndex=0, currentRow=0, numRows=2, scatterRows=0x7fd9dc0177a0,
    filterHits=0x7fd9dc016f00, values=0x7fd9dc016fd0, numValues=@0x7fd9e299a74c: 0, visitor=...)
    at /mnt/DP_disk1/code/velox/./velox/dwio/parquet/reader/RleDecoder.h:259
#2  0x0000556f000d5d1e in facebook::velox::parquet::RleDecoder<false>::bulkScan<true, false, true, facebook::velox::dwio::common::StringDictionaryColumnVisitor<facebook::velox::common::BytesValues, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader>, true> > (this=0x7fd9dc0174e0, nonNullRows=..., scatterRows=0x7fd9dc0177a0, visitor=...)
    at /mnt/DP_disk1/code/velox/./velox/dwio/parquet/reader/RleDecoder.h:338
#3  0x0000556f000ca147 in facebook::velox::parquet::RleDecoder<false>::fastPath<true, facebook::velox::dwio::common::StringDictionaryColumnVisitor<facebook::velox::common::BytesValues, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader>, true> > (this=0x7fd9dc0174e0, nulls=0x7fd9dc017660, visitor=...) at /mnt/DP_disk1/code/velox/./velox/dwio/parquet/reader/RleDecoder.h:209
#4  0x0000556f000a78b9 in facebook::velox::parquet::RleDecoder<false>::readWithVisitor<true, facebook::velox::dwio::common::StringDictionaryColumnVisitor<facebook::velox::common::BytesValues, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader>, true> > (this=0x7fd9dc0174e0, nulls=0x7fd9dc017660, visitor=...)
    at /mnt/DP_disk1/code/velox/./velox/dwio/parquet/reader/RleDecoder.h:84
#5  0x0000556f0009ce7c in facebook::velox::parquet::PageReader::callDecoder<facebook::velox::dwio::common::ColumnVisitor<folly::Range<char const*>, facebook::velox::common::BytesValues, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader>, true>, 0> (this=0x7fd9dc01bab0, nulls=0x7fd9dc017660, nullsFromFastPath=@0x7fd9e299aae8: true, visitor=...)
    at /mnt/DP_disk1/code/velox/./velox/dwio/parquet/reader/PageReader.h:200
#6  0x0000556f000948b9 in facebook::velox::parquet::PageReader::readWithVisitor<facebook::velox::dwio::common::ColumnVisitor<folly::Range<char const*>, facebook::velox::common::BytesValues, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader>, true> > (this=0x7fd9dc01bab0, visitor=...) at /mnt/DP_disk1/code/velox/./velox/dwio/parquet/reader/PageReader.h:342
#7  0x0000556f0009152c in facebook::velox::parquet::ParquetData::readWithVisitor<facebook::velox::dwio::common::ColumnVisitor<folly::Range<char const*>, facebook::velox::common::BytesValues, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader>, true> > (this=0x7fd9dc012870, visitor=...) at /mnt/DP_disk1/code/velox/./velox/dwio/parquet/reader/ParquetData.h:118
#8  0x0000556f0008e8a9 in facebook::velox::parquet::StringColumnReader::readHelper<facebook::velox::common::BytesValues, true, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader> > (this=0x7fd9dc0126d0, filter=0x7fd9dc003610, rows=...,
    extractValues=...) at /mnt/DP_disk1/code/velox/velox/dwio/parquet/reader/StringColumnReader.cpp:38
#9  0x0000556f0008ced1 in facebook::velox::parquet::StringColumnReader::processFilter<true, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader> > (this=0x7fd9dc0126d0, filter=0x7fd9dc003610, rows=..., extractValues=...)
    at /mnt/DP_disk1/code/velox/velox/dwio/parquet/reader/StringColumnReader.cpp:77
#10 0x0000556f0008b428 in facebook::velox::parquet::StringColumnReader::read (this=0x7fd9dc0126d0, offset=0, rows=..., incomingNulls=0x0)
    at /mnt/DP_disk1/code/velox/velox/dwio/parquet/reader/StringColumnReader.cpp:111
#11 0x0000556f00867d29 in facebook::velox::dwio::common::SelectiveStructColumnReader::read (this=0x7fd9dc0122c0, offset=0, rows=...,
    incomingNulls=0x0) at /mnt/DP_disk1/code/velox/velox/dwio/common/SelectiveStructColumnReader.cpp:155

@netlify
Copy link

netlify bot commented Oct 14, 2022

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit d90997d
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/634d314569508100091e9b98

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 14, 2022
@kagamiori kagamiori requested review from kgpai and Yuhta October 14, 2022 03:49
@mbasmanova
Copy link
Contributor

CC: @oerling Orri, looks like this is a repro for one of the bugs in the Parquet reader.

@mbasmanova
Copy link
Contributor

@oerling Looks like this is a repro for one of the bugs in the Parquet reader.

@jinchengchenghh Would you like to attempt a fix?

@jinchengchenghh
Copy link
Contributor Author

jinchengchenghh commented Oct 17, 2022

Yeah, my pleasure. Since I'm new to velox and parquet reader, it may take a long time. Can you help fix it? We can both work on it, I will show my progress in this PR. @oerling

@jinchengchenghh
Copy link
Contributor Author

In makeDecoder, the rleDecoder_ bitWidth is pageData_[0]
image

    case Encoding::PLAIN_DICTIONARY:
      rleDecoder_ = std::make_unique<RleDecoder<false>>(
          pageData_ + 1, pageData_ + encodedDataSize_, pageData_[0]);
      break;

@jinchengchenghh
Copy link
Contributor Author

Simplified query stacktrace:

#0  0x000056300687103b in facebook::velox::dwio::common::IntDecoder<false>::decodeBitsLE<int> (bits=0x7f3fdc01bc28, bitOffset=0,
    rows=..., rowBias=0, bitWidth=0 '\000', bufferEnd=0x7f3fdc01bc28 '\241' <repeats 24 times>, "\255\336\335ں\335ں!",
    result=0x7f3fdc01bdd0) at /mnt/DP_disk1/code/velox/velox/dwio/common/IntDecoder.cpp:2602
#1  0x00005630060c07f8 in facebook::velox::parquet::RleDecoder<false>::processRun<false, false, true, facebook::velox::dwio::common::StringDictionaryColumnVisitor<facebook::velox::common::AlwaysTrue, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader>, true> > (this=0x7f3fdc01be60, rows=0x7f3fdc017480, rowIndex=0, currentRow=0, numRows=2, scatterRows=0x7f3fdc017ac0,
    filterHits=0x0, values=0x7f3fdc01bdd0, numValues=@0x7f3fe7ffb78c: 0, visitor=...)
    at /mnt/DP_disk1/code/velox/./velox/dwio/parquet/reader/RleDecoder.h:259
#2  0x00005630060b1ba2 in facebook::velox::parquet::RleDecoder<false>::bulkScan<false, false, true, facebook::velox::dwio::common::StringDictionaryColumnVisitor<facebook::velox::common::AlwaysTrue, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader>, true> > (this=0x7f3fdc01be60, nonNullRows=..., scatterRows=0x7f3fdc017ac0, visitor=...)
    at /mnt/DP_disk1/code/velox/./velox/dwio/parquet/reader/RleDecoder.h:338
#3  0x00005630060a6b77 in facebook::velox::parquet::RleDecoder<false>::fastPath<true, facebook::velox::dwio::common::StringDictionaryColumnVisitor<facebook::velox::common::AlwaysTrue, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader>, true> > (this=0x7f3fdc01be60, nulls=0x7f3fdc017a20, visitor=...) at /mnt/DP_disk1/code/velox/./velox/dwio/parquet/reader/RleDecoder.h:209
#4  0x000056300607fc95 in facebook::velox::parquet::RleDecoder<false>::readWithVisitor<true, facebook::velox::dwio::common::StringDictionaryColumnVisitor<facebook::velox::common::AlwaysTrue, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader>, true> > (this=0x7f3fdc01be60, nulls=0x7f3fdc017a20, visitor=...)
    at /mnt/DP_disk1/code/velox/./velox/dwio/parquet/reader/RleDecoder.h:84
#5  0x0000563006079fc0 in facebook::velox::parquet::PageReader::callDecoder<facebook::velox::dwio::common::ColumnVisitor<folly::Range<char const*>, facebook::velox::common::AlwaysTrue, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader>, true>, 0> (this=0x7f3fdc0172d0, nulls=0x7f3fdc017a20, nullsFromFastPath=@0x7f3fe7ffbae8: true, visitor=...)
    at /mnt/DP_disk1/code/velox/./velox/dwio/parquet/reader/PageReader.h:211
#6  0x000056300607144f in facebook::velox::parquet::PageReader::readWithVisitor<facebook::velox::dwio::common::ColumnVisitor<folly::Range<char const*>, facebook::velox::common::AlwaysTrue, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader>, true> > (this=0x7f3fdc0172d0, visitor=...) at /mnt/DP_disk1/code/velox/./velox/dwio/parquet/reader/PageReader.h:353
#7  0x000056300606ed16 in facebook::velox::parquet::ParquetData::readWithVisitor<facebook::velox::dwio::common::ColumnVisitor<folly::Range<char const*>, facebook::velox::common::AlwaysTrue, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader>, true> > (this=0x7f3fdc0126a0, visitor=...) at /mnt/DP_disk1/code/velox/./velox/dwio/parquet/reader/ParquetData.h:118
#8  0x000056300606c0c5 in facebook::velox::parquet::StringColumnReader::readHelper<facebook::velox::common::AlwaysTrue, true, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader> > (this=0x7f3fdc012500, filter=0x0, rows=...,
    extractValues=...) at /mnt/DP_disk1/code/velox/velox/dwio/parquet/reader/StringColumnReader.cpp:38
#9  0x000056300606ad38 in facebook::velox::parquet::StringColumnReader::processFilter<true, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader> > (this=0x7f3fdc012500, filter=0x0, rows=..., extractValues=...)
    at /mnt/DP_disk1/code/velox/velox/dwio/parquet/reader/StringColumnReader.cpp:52
#10 0x000056300606916c in facebook::velox::parquet::StringColumnReader::read (this=0x7f3fdc012500, offset=0, rows=..., incomingNulls=0x0)
    at /mnt/DP_disk1/code/velox/velox/dwio/parquet/reader/StringColumnReader.cpp:111
#11 0x00005630068b29fd in facebook::velox::dwio::common::SelectiveStructColumnReader::read (this=0x7f3fdc0120f0, offset=0, rows=...,
    incomingNulls=0x0) at /mnt/DP_disk1/code/velox/velox/dwio/common/SelectiveStructColumnReader.cpp:167
#12 0x00005630068b2497 in facebook::velox::dwio::common::SelectiveStructColumnReader::next (this=0x7f3fdc0120f0, numValues=2, result=

@mbasmanova
Copy link
Contributor

CC: @yingsu00 @majetideepak

@mbasmanova
Copy link
Contributor

@jinchengchenghh This may be fixed in #2879

@jinchengchenghh
Copy link
Contributor Author

Fixed in #2879

@jinchengchenghh jinchengchenghh deleted the parquetReader branch February 16, 2023 02:00
marin-ma pushed a commit to marin-ma/velox-oap that referenced this pull request Dec 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants