Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-7410] Use SeekableDataInputStream as the input of native HFile reader #10673

Merged
merged 1 commit into from
Feb 15, 2024

Conversation

yihua
Copy link
Contributor

@yihua yihua commented Feb 15, 2024

Change Logs

This PR makes SeekableDataInputStream as the input of native HFile reader, so that the constructor of the native HFile reader is Hadoop-agnostic.

This is part of the effort to provide Hudi storage abstraction and decouple hudi-common from hadoop dependencies. For reference, the single big-change PR can be found here: #10360.

Impact

The constructor of the native HFile reader is Hadoop-agnostic.

Risk level

none

Documentation Update

N/A

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@jonvex jonvex self-requested a review February 15, 2024 15:09
@jonvex jonvex self-assigned this Feb 15, 2024
@@ -238,7 +239,7 @@ private static HFileReader createReader(String hFilePath, FileSystem fileSystem)
LOG.info("Opening HFile for reading :" + hFilePath);
Path path = new Path(hFilePath);
long fileSize = fileSystem.getFileStatus(path).getLen();
FSDataInputStream stream = fileSystem.open(path);
SeekableDataInputStream stream = new HadoopSeekableDataInputStream(fileSystem.open(path));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to be HadoopSeekableDataInputStream going forward? Or is hadoop going to be fully removed from here at some point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be replaced by the new storage API call which returns SeekableDataInputStream directly. Hadoop is going to be fully removed here in the future.

@yihua yihua merged commit 80f9f1e into apache:master Feb 15, 2024
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

3 participants