Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace the use of alldocs with respective collections for record retrieval in NCBI exporter #907

Conversation

sujaypatil96
Copy link
Collaborator

@sujaypatil96 sujaypatil96 commented Feb 14, 2025

Fixes microbiomedata/issues#1063

TODO:

  • Make sure the export code picks up on platform and instrument_model Attributes correctly
  • For data object records that are linked by a mainfest record, the SRA Action block needs to have references to all the connected files

On this branch, I...

Details

...

Related issue(s)

...

Related subsystem(s)

  • Runtime API (except the Minter)
  • Minter
  • Dagster
  • Project documentation (in the docs directory)
  • Translators (metadata ingest pipelines)
  • MongoDB migrations
  • Other

Testing

  • I tested these changes (explain below)
  • I did not test these changes

I tested these changes by...

Documentation

  • I have not checked for relevant documentation yet (e.g. in the docs directory)
  • I have updated all relevant documentation so it will remain accurate
  • Other (explain below)

Maintainability

  • Every Python function I defined includes a docstring (test functions are exempt from this)
  • Every Python function parameter I introduced includes a type hint (e.g. study_id: str)
  • All "to do" or "fix me" Python comments I added begin with either # TODO or # FIXME
  • I used black to format all the Python files I created/modified
  • The PR title is in the imperative mood (e.g. "Do X") and not the declarative mood (e.g. "Does X" or "Did X")

@sujaypatil96 sujaypatil96 requested a review from aclum February 14, 2025 23:19
@sujaypatil96 sujaypatil96 changed the title Replace the use of alldocs with respective collections for record retrieval in NCBI exporter Replace the use of alldocs with respective collections for record retrieval in NCBI exporter Feb 14, 2025
@sujaypatil96 sujaypatil96 marked this pull request as draft February 18, 2025 19:06
@eecavanna
Copy link
Collaborator

I'll review this PR this morning.

eecavanna
eecavanna previously approved these changes Feb 19, 2025
Copy link
Collaborator

@eecavanna eecavanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making these changes to accommodate the new sparseness of the "all docs" (maybe one day renamed to "all refs" or something) collection, @sujaypatil96.

I made some comments related to type hints, docstrings, and 1-2 variable names.

@sujaypatil96
Copy link
Collaborator Author

Thank you for the review comments @eecavanna, they were very helpful! I was being lazy by not adding type hints and docstrings. I've done that now. I think I've addressed all your review comments, feel free to drop another review.

eecavanna
eecavanna previously approved these changes Feb 20, 2025
@eecavanna
Copy link
Collaborator

Thanks for making those changes, @sujaypatil96! I found the docstrings and type hints helpful in keeping track of what is happening in the code (while reading it).

I left one follow-up comment about a type hint that I thought was more broad than necessary. I'll defer to you about that. You can merge already or edit-and-merge.

@sujaypatil96 sujaypatil96 merged commit c7f1331 into main Feb 20, 2025
2 checks passed
@sujaypatil96 sujaypatil96 deleted the 1063-bug-update-ncbi-xml-export-pipeline-to-use-respective-collections-for-data-fetching-rather-than-alldocs branch February 20, 2025 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Update NCBI XML export pipeline to use respective collections for data fetching rather than alldocs
2 participants