-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Thrift Transport for Parquet Metadata Access #4160
Refactor Thrift Transport for Parquet Metadata Access #4160
Conversation
✅ Deploy Preview for meta-velox canceled.
|
43480e7
to
3373687
Compare
3373687
to
0758a1d
Compare
c59787d
to
b18a681
Compare
b18a681
to
531f3b4
Compare
531f3b4
to
9471f36
Compare
79f7029
to
d9560c6
Compare
ad03558
to
4ef5b02
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liushengxuan Will you be able to update the footer reading using the new transport as well? If yes we don't need to keep the old transport.
@Yuhta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
4ef5b02
to
6204515
Compare
This PR refactors the Thrift Transport for Parquet Metadata access. It uses ThriftTransport as an interface and introduces ThriftBufferedTransport and ThriftStreamingTransport. ThriftStreamingTransport takes in a SeekableInputStream as input for Thrift parsing. This can be used for Parquet Page Header parsing. This optimization is able to reduce the deep copy in readPageHeader(). And it is also the prerequisite to fix the incorrect page header length issue. ThriftBufferedTransport takes in a consecutive memory space as input for Thrift parsing. This can be used for Parquet Footer parsing, because the footer is at the bottom of the file and we need to also take care of footer length and PAR1, from the bottom to top.
6204515
to
df4a80c
Compare
@Yuhta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
…g data (#145) * Port a patch: Refactor Thrift Transport for Parquet Metadata Access facebookincubator#4160 * Port a patch: Read Parquet Page Header with ThriftStreamingTransport to Fix the Incorrect Header Length facebookincubator#4108
…g data (oap-project#145) * Port a patch: Refactor Thrift Transport for Parquet Metadata Access facebookincubator#4160 * Port a patch: Read Parquet Page Header with ThriftStreamingTransport to Fix the Incorrect Header Length facebookincubator#4108
…g data (oap-project#145) * Port a patch: Refactor Thrift Transport for Parquet Metadata Access facebookincubator#4160 * Port a patch: Read Parquet Page Header with ThriftStreamingTransport to Fix the Incorrect Header Length facebookincubator#4108
…okincubator#4160)" This reverts commit 39977b1.
…okincubator#4160)" This reverts commit 39977b1.
This PR refactors the Thrift Transport for Parquet Metadata access. It uses
ThriftTransport
as an interface and introducesThriftBufferedTransport
andThriftStreamingTransport
.ThriftStreamingTransport
takes in aSeekableInputStream
as input for Thrift parsing. This can be used for Parquet Page Header parsing. This optimization is able to reduce the deep copy inreadPageHeader()
. And it is also the prerequisite to fix the incorrect page header length issue.ThriftBufferedTransport
takes in a consecutive memory space as input for Thrift parsing. This can be used for Parquet Footer parsing, because the footer is at the bottom of the file and we need to also take care offooter length
andPAR1
, from the bottom to top.