Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support only a path to data root directory #57

Merged
merged 8 commits into from
Apr 13, 2023

Conversation

stevebachmeier
Copy link
Contributor

Title: Support only paths to data root directory

Description

  • Category: feature
  • JIRA issue: MIC-3960

This removes support of a user passing in a pd.DataFrame and requires
that the source arg be a path to the data root directory.

Testing

Added to the integration pytest that splits the sample data into
two smaller datasets and saves them to tmpdir, then uses tmpdir as
the input.

if data_path.suffix == ".hdf":
data = pd.read_hdf(data_path)
elif data_path.suffix == ".parquet":
data = pd.read_parquet(data_path)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I actually had it as a utility function but then realized I only had it here. But then I updated the tests and it's used there as well. I could go either way

@stevebachmeier stevebachmeier requested a review from a team as a code owner April 13, 2023 16:13
noised_data_different_seed = noising_function(seed=1)
sample_data_path = list(
(paths.SAMPLE_DATA_ROOT / data_dir_name).glob(f"{data_dir_name}*")
)[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you only grabbing the first file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have control over the sample data and know that there is only one dataset per form

@stevebachmeier stevebachmeier merged commit 8a02afe into develop Apr 13, 2023
@stevebachmeier stevebachmeier deleted the feature/sbachmei/MIC-3960-use-data-dir branch April 13, 2023 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants