Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: remove LevelDecoder #3872

Merged
merged 1 commit into from
Mar 16, 2023
Merged

Conversation

Weijun-H
Copy link
Member

Which issue does this PR close?

Closes #2379

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the parquet Changes to the parquet crate label Mar 15, 2023
@viirya viirya changed the title chore: remove LevelDecode chore: remove LevelDecoder Mar 15, 2023
@tustvold tustvold merged commit b466cc7 into apache:master Mar 16, 2023
@zeevm
Copy link
Contributor

zeevm commented May 18, 2023

@tustvold @viirya

Hi,
In my use of parquet-rs, I have a flow that reads and computes over the raw dictionary IDs in data pages, to get to those I'm using LevelDecoder to read repetition and definition levels, this change broke my code.

I'd like to put LevelDecoder back in WDYT?

@tustvold
Copy link
Contributor

tustvold commented May 18, 2023

LevelDecoder to read repetition and definition levels

What do you think of either:

  • Use RleDecoder's directly
  • Maintain LevelDecoder within your own project

I would rather avoid maintaining an experimental API that isn't used by any of the first-class APIs

@zeevm
Copy link
Contributor

zeevm commented May 18, 2023

I am using RleDecoder of course to decode the RLE into dictionary IDs.

I am using LevelDecoder for 2 things:

  1. Get the definition levels (so I know which positions are null or defined)
  2. Find the position within the ByteBufferPtr where the RLE encoded values start to feed into the RLE decoder in V1 pages since those pages don't have the size of the definition/repetition levels in the page header

@tustvold
Copy link
Contributor

The page header contains the encoding, either Encoding::RLE or Encoding::BIT_PACKED, based on this you can select either to use RleDecoder or BitReader, which should do what you require. This is why LevelDecoder is no longer used, it is redundant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove LevelDecoder
4 participants