-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support only a path to data root directory #57
support only a path to data root directory #57
Conversation
src/pseudopeople/interface.py
Outdated
if data_path.suffix == ".hdf": | ||
data = pd.read_hdf(data_path) | ||
elif data_path.suffix == ".parquet": | ||
data = pd.read_parquet(data_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I actually had it as a utility function but then realized I only had it here. But then I updated the tests and it's used there as well. I could go either way
…/sbachmei/MIC-3960-use-data-dir
…/sbachmei/MIC-3960-use-data-dir
noised_data_different_seed = noising_function(seed=1) | ||
sample_data_path = list( | ||
(paths.SAMPLE_DATA_ROOT / data_dir_name).glob(f"{data_dir_name}*") | ||
)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you only grabbing the first file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have control over the sample data and know that there is only one dataset per form
Title: Support only paths to data root directory
Description
This removes support of a user passing in a pd.DataFrame and requires
that the
source
arg be a path to the data root directory.Testing
Added to the integration pytest that splits the sample data into
two smaller datasets and saves them to tmpdir, then uses tmpdir as
the input.