Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding post-indexing validation module #240

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

jacob50231
Copy link
Contributor

Link to JIRA ticket if there is one: https://ctds-planx.atlassian.net/browse/DCF-1819

New Features

Adds a module for post-indexing validation

Breaking Changes

Bug Fixes

Improvements

Dependency updates

Deployment changes

Copy link

github-actions bot commented Nov 1, 2024

The style in this PR agrees with black. ✔️

This formatting comment was generated automatically by a script in uc-cdis/wool.

return

# Define the CSV file header
fieldnames = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make these lower case and instead of spaces, use _ for things like download_status. Thats usually the format we have across gen3 and the indexing manifests. Also makes its simpler to digest if we ever need to write a script to go through these.

pass


class Record:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This almost feels like a redo of this class here https://github.com/uc-cdis/gen3sdk-python/blob/master/gen3/file.py#L26
I dont want us to have multiple classes in the same repo, doing the same thing. It makes it so that if we made a change to our download endpoint, we'd have to end up maintaining changes in two different places. We already have something that interacts with the gen3 download endpoints, i think we should use Gen3File instead of using this new one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a few concerns about using the Gen3File module

  • Gen3File has no async downloads. Will increase runtime for larger manifests
  • Gen3File requires file output which is an inconvenience for these purposes.
  • Gen3File does not return information about request status (Important)

Other than that I'd be open to redoing my code or I still think it may be beneficial to refactor some of what I've written and put it into the Gen3File class for async downloads just so that's all in one place. Just some open thoughts curious to see what you're thinking.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lost of similarities than not with the Gen3File and Record feels very single purposed. I would really prefer it if we go with using Gen3File

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants