Releases: X-DataInitiative/SCALPEL-Flattening
Releases · X-DataInitiative/SCALPEL-Flattening
First public release
Some cleaning has been done, Statistics is deprecated but not removed yet. Documentation effort is ongoing.
performance improvement
integration-pureconfig-release 1.1
This release is based on Spark 2.3.0, It contains the following:
- integration of PureConfig.
- flatten fall cohort 2014 2016.
Flattening by month
The main feature isthe possibility to join the tables by month so that we avoid memory problems.
The main changes:
• When converting tables from csv to parquet, add the possibility to partition by a column
• Add joinByYearAndMonth method in order to partition by month
• Add some config parameters :
-- partition_column to partition the single table (optional)
-- monthly_partition : yes or no to join by month
• Change sameAs definition so that two dataframes that have different column ordering are considered
Fall data flattening validation
- Ran at the CNAM on 29/05/2017