-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add renaming transformer #170
Comments
yruslan
added a commit
that referenced
this issue
Oct 21, 2020
yruslan
added a commit
that referenced
this issue
Oct 21, 2020
yruslan
added a commit
that referenced
this issue
Oct 22, 2020
yruslan
added a commit
that referenced
this issue
Oct 22, 2020
yruslan
added a commit
that referenced
this issue
Oct 22, 2020
yruslan
added a commit
that referenced
this issue
Oct 30, 2020
yruslan
added a commit
that referenced
this issue
Oct 30, 2020
yruslan
added a commit
that referenced
this issue
Oct 30, 2020
yruslan
added a commit
that referenced
this issue
Oct 30, 2020
yruslan
added a commit
that referenced
this issue
Oct 30, 2020
yruslan
added a commit
that referenced
this issue
Oct 30, 2020
yruslan
added a commit
that referenced
this issue
Oct 30, 2020
* Add column renaming transformer. * Extend e2e the test suite with the renaming transformer. * Add renaming transformer usage description to README.md.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Background
The idea of the transformer was suggested by @kevinwallimann after a discussion on how Hive handles parquet partition directories. Since Hive (and most SQL in general) is case insensitive, and HDFS is not, Hive expects a convention that all partitioning columns will be lowercase. This is configurable on the global Hive level, but end-users of Hive don't have always permissions to change these configs. So making partitioning columns lowercase seems like the most workable solution for this particular use case.
But, let's say, have a data source in Kafka that has a column name 'infoDate'. If we want to partition by this coolumn, we can specify this in Hyperdrive:
But this will create a folder structure like this:
even if we specify
writer.parquet.partition.columns=infodate
, the partitioning column will be as defined in the data source since the default transformer doesn't change column names in any way.Feature
Implement a renaming transformer in which a user can specify one or more columns to rename.
The text was updated successfully, but these errors were encountered: