Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update migration notebook to only dump/restore relevant collections #918

Merged

Conversation

eecavanna
Copy link
Collaborator

On this branch, I made a performance optimization to the migration notebook used to migrate the Mongo database from nmdc-schema version v11.3.0 to v11.4.0.

Details

I made it so the notebook only dumps and restores the workflow_execution_set collection, since that is the only collection touched by the migrator that the notebook imports from nmdc-schema.

Related issue(s)

Fixes #917

Related subsystem(s)

  • Runtime API (except the Minter)
  • Minter
  • Dagster
  • Project documentation (in the docs directory)
  • Translators (metadata ingest pipelines)
  • MongoDB migrations
  • Other

Testing

  • I tested these changes (explain below)
  • I did not test these changes

I used this optimization when migrating the development Mongo server last week.

Documentation

  • I have not checked for relevant documentation yet (e.g. in the docs directory)
  • I have updated all relevant documentation so it will remain accurate
  • Other (explain below)

Maintainability

  • Every Python function I defined includes a docstring (test functions are exempt from this)
  • Every Python function parameter I introduced includes a type hint (e.g. study_id: str)
  • All "to do" or "fix me" Python comments I added begin with either # TODO or # FIXME
  • I used black to format all the Python files I created/modified
  • The PR title is in the imperative mood (e.g. "Do X") and not the declarative mood (e.g. "Does X" or "Did X")

@eecavanna eecavanna self-assigned this Feb 24, 2025
@eecavanna eecavanna linked an issue Feb 24, 2025 that may be closed by this pull request
@eecavanna
Copy link
Collaborator Author

Merging in before tomorrow's release, during which we'll be running this notebook.

@eecavanna eecavanna merged commit 2004d59 into main Feb 24, 2025
@eecavanna eecavanna deleted the 917-optimize-the-schema-1130-to-1140-migration-notebook branch February 24, 2025 06:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize the schema 11.3.0-to-11.4.0 migration notebook
1 participant