Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add renaming transformer #170

Closed
yruslan opened this issue Oct 20, 2020 · 0 comments · Fixed by #173
Closed

Add renaming transformer #170

yruslan opened this issue Oct 20, 2020 · 0 comments · Fixed by #173
Assignees
Milestone

Comments

@yruslan
Copy link
Contributor

yruslan commented Oct 20, 2020

Background

The idea of the transformer was suggested by @kevinwallimann after a discussion on how Hive handles parquet partition directories. Since Hive (and most SQL in general) is case insensitive, and HDFS is not, Hive expects a convention that all partitioning columns will be lowercase. This is configurable on the global Hive level, but end-users of Hive don't have always permissions to change these configs. So making partitioning columns lowercase seems like the most workable solution for this particular use case.

But, let's say, have a data source in Kafka that has a column name 'infoDate'. If we want to partition by this coolumn, we can specify this in Hyperdrive:

writer.parquet.partition.columns=infoDate

But this will create a folder structure like this:

/bigdata/somedata/infoDate=2020-10-01/
/bigdata/somedata/infoDate=2020-10-02/
...

even if we specify writer.parquet.partition.columns=infodate, the partitioning column will be as defined in the data source since the default transformer doesn't change column names in any way.

Feature

Implement a renaming transformer in which a user can specify one or more columns to rename.

@yruslan yruslan self-assigned this Oct 21, 2020
yruslan added a commit that referenced this issue Oct 21, 2020
yruslan added a commit that referenced this issue Oct 21, 2020
yruslan added a commit that referenced this issue Oct 22, 2020
yruslan added a commit that referenced this issue Oct 30, 2020
@kevinwallimann kevinwallimann linked a pull request Oct 30, 2020 that will close this issue
yruslan added a commit that referenced this issue Oct 30, 2020
* Add column renaming transformer.
* Extend e2e the test suite with the renaming transformer.
* Add renaming transformer usage description to README.md.
@kevinwallimann kevinwallimann added this to the v4.1.0 milestone Oct 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants