parquet
format for language-agnostic analysis
#1886
roaldarbol
started this conversation in
Ideas
Replies: 1 comment 1 reply
-
Personally in favor of this and should be easy to do by wrangling the exported data into a pandas dataframe first. Using |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
There are a few roadblocks to analyse SLEAP data in other languages currently, when relying on the
h5
format. I know you're not quite happy withcsv
, which I agree with, it's not very efficient (and the currentcsv
export is brittle in edge-cases, #1578). But only relying onh5
limits the use of SLEAP, because it is so particular to Python (there are bridges like rhdf5, but it's very unintuitive to use with R and doesn't play nice with the majority of toolchains).Generally, I would recommend adding export to
parquet
format.parquet
, and the rest of Arrow, works across every major language used for analysis, e.g. Python, R, Julia, MATLAB, Rust, JavaScript etc. Such a data frame can then be "grouped" by instance and bodypart to keep the file size low.Beta Was this translation helpful? Give feedback.
All reactions