Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor migrator classes into own modules and fix YAML file loading #1340

Conversation

eecavanna
Copy link
Collaborator

@eecavanna eecavanna commented Nov 12, 2023

Summary

  • Created a new package named migrators that contains the migration-specific classes
  • Moved MigratorBase class from migration_recursion.py into its own module within the migrators package
  • Moved each migration-specific class from migration_recursion.py into its own module within the migrators package
  • Moved a YAML file used by one migration-specific class into a new assets/ directory within the migrators package
  • Moved CURIE-related helper functions directly called by migration_recursion.py into a new CurieMigrator class
  • Removed the unused force_prefix property from the MigratorBase class (it still exists on the CurieMigrator class)
  • Capitalized names of regex pattern "constants" per PEP 8 style guide
  • Configured Poetry so consumers of the PyPI package can import things residing in nmdc_schema/migrators/ (instead of only things residing at the top level of nmdc_schema/).
    • I tested this new configuration by running pip install /path/to/local/clone/of/nmdc-schema and confirming the following import statement worked:
      from nmdc_schema.migrators.migrator_from_8_1_to_9_0 import Migrator_from_8_1_to_9_0 as Migrator
    • I did not test it on PyPI (I did not publish anything to PyPI)
  • Fixed bug where, in general, a YAML file could not be loaded except for in the nmdc-schema development environment

@eecavanna eecavanna linked an issue Nov 12, 2023 that may be closed by this pull request
@eecavanna eecavanna self-assigned this Nov 12, 2023
@eecavanna eecavanna marked this pull request as ready for review November 12, 2023 21:29
@eecavanna
Copy link
Collaborator Author

In case this PR is still open on Tuesday, I can present the changes during the migration squad meeting that day.

Copy link
Contributor

@brynnz22 brynnz22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eecavanna this looks great to me! A lot easier to digest and understand than one big file. I'll let @turbomam approve this since he knows the structure of the repo much better than me. Is there a way we can include dumping the output of a migration(s) into a yaml file as we did with the migration_recursion.py file so that we can test the migrations while writing them?

@eecavanna
Copy link
Collaborator Author

eecavanna commented Nov 14, 2023

Thanks, @brynnz22! The migration_recursion.py script still behaves the same way (from the outside) that it does in the current main branch; i.e. it can still be used to migrate data residing in a YAML file. It still runs the transformation functions (it imports them from this new migrators package instead of from the same module as itself). Does that answer your question?


On a side note about testing, after this branch has been merged in, I'm planning to add some doctests to the transformation functions (#1291). I'll add a note to tomorrow's squad meeting agenda to demonstrate doctests.

@eecavanna
Copy link
Collaborator Author

eecavanna commented Nov 14, 2023

Action items from pair review

  • Emphasize in click help that the class name is NOT the module name / filename.
  • Emphasize X.Y as default instead of X.Y.Z for migrations.
    • I will do this in a separate PR that is focused on documentation and the tutorial class

@eecavanna
Copy link
Collaborator Author

eecavanna commented Nov 14, 2023

I updated the click command help. Here's a snippet from the latest help:

  --migrator-name TEXT   The name of a migrator class that you want to use to
                         migrate data. If you specify this option multiple
                         times, the specified migrator classes will be used in
                         the order specified.
                         
                         A migrator class is a class that inherits from
                         `MigratorBase`. Migrator classes are defined in
                         `nmdc_schema/migrators/` and their names begin with
                         capital `Migrator_` (as opposed to lowercase
                         `migrator_`, which refers to a module).

@eecavanna
Copy link
Collaborator Author

Hi @turbomam, I finished making one of the changes we discussed in today's pair review. I want to hold off on the other one until a future PR, where I make other changes to the documentation and example migrator class.

@turbomam turbomam merged commit 82ad2a0 into main Nov 14, 2023
@eecavanna eecavanna deleted the 1250-move-each-migration-class-into-its-own-module-in-a-migrations-folder branch November 14, 2023 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
3 participants