Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORC-1023: Support writing bloom filters in ConvertTool #933

Merged
merged 2 commits into from
Oct 11, 2021

Conversation

stiga-huang
Copy link
Contributor

What changes were proposed in this pull request?

This PR adds an option to the java tool ConvertTool to specify which columns it should generate bloom filters.

Why are the changes needed?

While debugging an issue, I need to generate an ORC file with bloom filters using the Java APIs. The ConvertTool is easy to use but it doesn't generate bloom filters. It'd be helpful to add an option for it.

How was this patch tested?

Didn't find any existing tests on ConvertTool. So I manually tested it and verified the bloom filters are generated.

@github-actions github-actions bot added the JAVA label Oct 10, 2021
@guiyanakuang
Copy link
Member

It would feel better to update the doc together. https://github.com/apache/orc/blob/main/site/_docs/java-tools.md 😄

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making a PR, @stiga-huang .

+1 for @guiyanakuang 's review comment.

@github-actions github-actions bot added the DOCS label Oct 11, 2021
@stiga-huang
Copy link
Contributor Author

Nice catch! Updated the doc.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you for updating.

@dongjoon-hyun dongjoon-hyun merged commit 7c45137 into apache:main Oct 11, 2021
@dongjoon-hyun
Copy link
Member

Hi, @stiga-huang and @guiyanakuang .
I'm preparing new Apache ORC 1.6.x and 1.7.x release.
As I wrote in the dev mailing list, this looks like a good example we can deliver safely without a risk.
I'll backport this to branch-1.7 for Apache ORC 1.7.1.

@dongjoon-hyun dongjoon-hyun added this to the 1.8.0 milestone Nov 2, 2021
dongjoon-hyun pushed a commit that referenced this pull request Dec 16, 2021
### What changes were proposed in this pull request?

This PR adds an option to the java tool ConvertTool to specify which columns it should generate bloom filters.

### Why are the changes needed?

While debugging an issue, I need to generate an ORC file with bloom filters using the Java APIs. The ConvertTool is easy to use but it doesn't generate bloom filters. It'd be helpful to add an option for it.

### How was this patch tested?

Didn't find any existing tests on ConvertTool. So I manually tested it and verified the bloom filters are generated.

(cherry picked from commit 7c45137)
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member

This is backported to branch-1.7.

@dongjoon-hyun dongjoon-hyun modified the milestones: 1.8.0, 1.7.2 Dec 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants