-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding post-indexing validation module #240
base: master
Are you sure you want to change the base?
Conversation
…n3sdk-python into feat/post-indexing-validation
…n3sdk-python into feat/post-indexing-validation
…n3sdk-python into feat/post-indexing-validation
The style in this PR agrees with This formatting comment was generated automatically by a script in uc-cdis/wool. |
return | ||
|
||
# Define the CSV file header | ||
fieldnames = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make these lower case and instead of spaces, use _
for things like download_status
. Thats usually the format we have across gen3 and the indexing manifests. Also makes its simpler to digest if we ever need to write a script to go through these.
pass | ||
|
||
|
||
class Record: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This almost feels like a redo of this class here https://github.com/uc-cdis/gen3sdk-python/blob/master/gen3/file.py#L26
I dont want us to have multiple classes in the same repo, doing the same thing. It makes it so that if we made a change to our download endpoint, we'd have to end up maintaining changes in two different places. We already have something that interacts with the gen3 download endpoints, i think we should use Gen3File
instead of using this new one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a few concerns about using the Gen3File module
- Gen3File has no async downloads. Will increase runtime for larger manifests
- Gen3File requires file output which is an inconvenience for these purposes.
- Gen3File does not return information about request status (Important)
Other than that I'd be open to redoing my code or I still think it may be beneficial to refactor some of what I've written and put it into the Gen3File class for async downloads just so that's all in one place. Just some open thoughts curious to see what you're thinking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Gen3File has no async downloads. Will increase runtime for larger manifests
Im not too worried about having async or not but you can do this to theGen3File
so that we can have both sync and async https://github.com/uc-cdis/drsclient/blob/5771d658b962d97035f6a06691548c7fe16d2b87/drsclient/client.py#L10 -
Gen3File does not return information about request status (Important)
I think it does it here https://github.com/uc-cdis/gen3sdk-python/blob/master/gen3/file.py#L68 -
Gen3File requires file output which is an inconvenience for these purposes.
Could you elaborate?
There's a lost of similarities than not with the Gen3File
and Record
feels very single purposed. I would really prefer it if we go with using Gen3File
Link to JIRA ticket if there is one: https://ctds-planx.atlassian.net/browse/DCF-1819
New Features
Adds a module for post-indexing validation
Breaking Changes
Bug Fixes
Improvements
Dependency updates
Deployment changes