Add renaming transformer #170

yruslan · 2020-10-20T08:51:17Z

Background

The idea of the transformer was suggested by @kevinwallimann after a discussion on how Hive handles parquet partition directories. Since Hive (and most SQL in general) is case insensitive, and HDFS is not, Hive expects a convention that all partitioning columns will be lowercase. This is configurable on the global Hive level, but end-users of Hive don't have always permissions to change these configs. So making partitioning columns lowercase seems like the most workable solution for this particular use case.

But, let's say, have a data source in Kafka that has a column name 'infoDate'. If we want to partition by this coolumn, we can specify this in Hyperdrive:

writer.parquet.partition.columns=infoDate

But this will create a folder structure like this:

/bigdata/somedata/infoDate=2020-10-01/
/bigdata/somedata/infoDate=2020-10-02/
...

even if we specify writer.parquet.partition.columns=infodate, the partitioning column will be as defined in the data source since the default transformer doesn't change column names in any way.

Feature

Implement a renaming transformer in which a user can specify one or more columns to rename.

The text was updated successfully, but these errors were encountered:

* Add column renaming transformer. * Extend e2e the test suite with the renaming transformer. * Add renaming transformer usage description to README.md.

yruslan self-assigned this Oct 21, 2020

yruslan added a commit that referenced this issue Oct 21, 2020

#170 Add column renaming transformer.

076689c

yruslan added a commit that referenced this issue Oct 21, 2020

#170 Add column renaming transformer.

69dca89

yruslan added a commit that referenced this issue Oct 22, 2020

#170 Add more unit tests for the renaming transformer.

f2bf19f

yruslan added a commit that referenced this issue Oct 22, 2020

#170 Add column renaming transformer.

6b081f7

yruslan added a commit that referenced this issue Oct 22, 2020

#170 Add more unit tests for the renaming transformer.

5ea9bbe

yruslan added a commit that referenced this issue Oct 30, 2020

#170 Extend e2e the test suite with the renaming transformer.

5860169

yruslan added a commit that referenced this issue Oct 30, 2020

#170 Add renaming transformer usage description to README.md.

239ee57

yruslan added a commit that referenced this issue Oct 30, 2020

#170 Add column renaming transformer.

25f28e0

yruslan added a commit that referenced this issue Oct 30, 2020

#170 Add more unit tests for the renaming transformer.

c7dccb1

yruslan added a commit that referenced this issue Oct 30, 2020

#170 Extend e2e the test suite with the renaming transformer.

b4f4316

yruslan added a commit that referenced this issue Oct 30, 2020

#170 Add renaming transformer usage description to README.md.

a2024cd

kevinwallimann linked a pull request Oct 30, 2020 that will close this issue

#170 Add column renaming transformer #173

Merged

yruslan closed this as completed in #173 Oct 30, 2020

yruslan added a commit that referenced this issue Oct 30, 2020

#170 Add column renaming transformer (#173)

02b6717

* Add column renaming transformer. * Extend e2e the test suite with the renaming transformer. * Add renaming transformer usage description to README.md.

kevinwallimann added this to the v4.1.0 milestone Oct 30, 2020

kevinwallimann added the enhancement label Nov 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add renaming transformer #170

Add renaming transformer #170

yruslan commented Oct 20, 2020

Add renaming transformer #170

Add renaming transformer #170

Comments

yruslan commented Oct 20, 2020

Background

Feature