Write input data to Parquet file #23

aufdenkampe · 2021-02-10T17:30:32Z

We've decided to write all input data to a Parquet file, which is a high-performance binary data storage format designed for big-data and cloud-computing.

Parquet is tightly integrated with Pandas, and is designed to manage complex hierarchical, nested data structures, similar to HDF5.

Our intent is to support both HDF5 and Parquet for storage of input and output data.

ptomasula · 2021-02-10T21:19:44Z

@aufdenkampe @steveskrip @htaolimno Switching the code to support Parquet files may prove to be a larger under taking than initially expected. It seems the main method takes HDF files as an argument then subsequently opens that HDF as an HDFStore which is passed from the main method to the various sub-processes. Making the switch will require that all of those various methods are updated to use a different file format as well.

I'm also not super keen on having a single file format be the only one supported by the business logic. I think there's a strong argument to be made for having a pandas DataFrame be level at which the main code interfaces with the input data and we write the appropriate utilities to read other files and store them as a uniformly formatted pandas DataFrame. I'd like to think that through some more. One immediate issue that comes to mind is holding all of the input data as a single DataFrame in memory could be a problem. Maybe the solution is to write a method that can pull out just the TS necessary for the specific operation (similar to this line) but not specific to a file format. Open to other suggestions on how to best handle this.

aufdenkampe added this to the Sprint 1: WDM read & write to Parquet milestone Feb 10, 2021

aufdenkampe assigned htaolimno and ptomasula Feb 10, 2021

aufdenkampe mentioned this issue Mar 5, 2021

Refactor I/O to rely on DataFrames & provide storage options #27

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write input data to Parquet file #23

Write input data to Parquet file #23

aufdenkampe commented Feb 10, 2021

ptomasula commented Feb 10, 2021 •

edited

Loading

Write input data to Parquet file #23

Write input data to Parquet file #23

Comments

aufdenkampe commented Feb 10, 2021

ptomasula commented Feb 10, 2021 • edited Loading

ptomasula commented Feb 10, 2021 •

edited

Loading