Skip to content

Flattening by month

Compare
Choose a tag to compare
@firas16 firas16 released this 23 Jan 16:45
· 36 commits to master since this release
a4a2295

The main feature isthe possibility to join the tables by month so that we avoid memory problems.
The main changes:
• When converting tables from csv to parquet, add the possibility to partition by a column
• Add joinByYearAndMonth method in order to partition by month
• Add some config parameters :
-- partition_column to partition the single table (optional)
-- monthly_partition : yes or no to join by month
• Change sameAs definition so that two dataframes that have different column ordering are considered