Flattening by month
·
36 commits
to master
since this release
The main feature isthe possibility to join the tables by month so that we avoid memory problems.
The main changes:
• When converting tables from csv to parquet, add the possibility to partition by a column
• Add joinByYearAndMonth method in order to partition by month
• Add some config parameters :
-- partition_column to partition the single table (optional)
-- monthly_partition : yes or no to join by month
• Change sameAs definition so that two dataframes that have different column ordering are considered