Skip to content

Commit

Permalink
Merge pull request #718 from microbiomedata/berkeley
Browse files Browse the repository at this point in the history
Update Runtime to work with Berkeley schema (i.e. `nmdc-schema` version `v11.0.0`)
  • Loading branch information
eecavanna authored Oct 8, 2024
2 parents 385982a + e0edb06 commit d4b06b8
Show file tree
Hide file tree
Showing 57 changed files with 2,026 additions and 1,096 deletions.
5 changes: 5 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -40,4 +40,9 @@ NERSC_USERNAME=replaceme
ORCID_NMDC_CLIENT_ID=replaceme
ORCID_NMDC_CLIENT_SECRET=replaceme

# Base URL (without a trailing slash) at which the Runtime can access an instance of ORCID.
# Note: For the production instance of ORCID, use: https://orcid.org (default)
# For the sandbox instance of ORCID, use: https://sandbox.orcid.org
ORCID_BASE_URL=https://orcid.org

INFO_BANNER_INNERHTML='Announcement: Something important is about to happen. If you have questions, please contact <a href="mailto:[email protected]">[email protected]</a>.'
101 changes: 0 additions & 101 deletions .github/workflows/build-and-release-to-spin-berkeley.yml

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,13 @@
{
"metadata": {},
"cell_type": "markdown",
"source": "# Migrate MongoDB database from `nmdc-schema` `v10.5.6` to `v10.8.0`",
"source": "# Migrate MongoDB database from `nmdc-schema` `v10.4.0` to `v10.9.1`",
"id": "d05efc6327778f9c"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "There are no migrators associated with any schema changes between schema versions `v10.5.6` and `v10.8.0`. So, this notebook is a \"no op\" (i.e. \"no operation\").",
"source": "There are no migrators associated with any schema changes between schema versions `v10.4.0` and `v10.9.1`. So, this notebook is a \"no op\" (i.e. \"no operation\").",
"id": "b99d5924e825b9a2"
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,13 @@
"cell_type": "markdown",
"id": "initial_id",
"metadata": {
"collapsed": true
"collapsed": true,
"jupyter": {
"outputs_hidden": true
}
},
"source": [
"# Migrate MongoDB database from `nmdc-schema` `v10.8.0` to `v11.0.0`"
"# Migrate MongoDB database from `nmdc-schema` `v10.9.1` to `v11.0.0`"
]
},
{
Expand All @@ -17,7 +20,7 @@
"source": [
"## Introduction\n",
"\n",
"This notebook will be used to migrate the database from `nmdc-schema` `v10.8.0` ([released](https://github.com/microbiomedata/nmdc-schema/releases/tag/v10.8.0) August 21, 2024) to `v11.0.0` (i.e. the initial version of the so-called \"Berkeley schema\").\n",
"This notebook will be used to migrate the database from `nmdc-schema` `v10.9.1` ([released](https://github.com/microbiomedata/nmdc-schema/releases/tag/v10.9.1) October 7, 2024) to `v11.0.0` (i.e. the initial version of the so-called \"Berkeley schema\").\n",
"\n",
"Unlike previous migrators, this one does not pick and choose which collections it will dump. There are two reasons for this: (1) migrators no longer have a dedicated `self.agenda` dictionary that indicates all the collections involved in the migration; and (2) this migration is the first one that involves creating, renaming, and dropping any collections; none of which are things that the old `self.agenda`-based system was designed to handle. So, instead of picking and choosing collections, this migrator **dumps them all.**"
]
Expand Down Expand Up @@ -106,12 +109,16 @@
"cell_type": "code",
"id": "e25a0af308c3185b",
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"scrolled": true
},
"source": [
"%pip install --upgrade pip\n",
"%pip install -r requirements.txt\n",
"%pip install nmdc-schema==11.0.0rc22"
"%pip install nmdc-schema==11.0.0"
],
"outputs": [],
"execution_count": null
Expand Down Expand Up @@ -273,7 +280,10 @@
"cell_type": "markdown",
"id": "bc387abc62686091",
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"### Create JSON Schema validator\n",
Expand All @@ -285,7 +295,10 @@
"cell_type": "code",
"id": "5c982eb0c04e606d",
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"nmdc_jsonschema: dict = get_nmdc_jsonschema_dict(variant=SchemaVariantIdentifier.nmdc_materialized_patterns)\n",
Expand Down Expand Up @@ -367,23 +380,23 @@
"execution_count": null
},
{
"metadata": {},
"cell_type": "markdown",
"id": "7f9c87de6fb8530c",
"metadata": {},
"source": [
"### Delete obsolete dumps from previous migrations\n",
"\n",
"Delete any existing dumps before we create new ones in this notebook. This is so the dumps you generate with this notebook do not get merged with any unrelated ones."
],
"id": "7f9c87de6fb8530c"
]
},
{
"metadata": {},
"cell_type": "code",
"id": "6a949d0fcb4b6fa0",
"metadata": {},
"source": [
"!rm -rf {cfg.origin_dump_folder_path}\n",
"!rm -rf {cfg.transformer_dump_folder_path}"
],
"id": "6a949d0fcb4b6fa0",
"outputs": [],
"execution_count": null
},
Expand All @@ -402,7 +415,9 @@
{
"cell_type": "code",
"id": "da530d6754c4f6fe",
"metadata": {},
"metadata": {
"scrolled": true
},
"source": [
"# Dump all collections from the \"origin\" database.\n",
"shell_command = f\"\"\"\n",
Expand Down Expand Up @@ -435,7 +450,9 @@
{
"cell_type": "code",
"id": "79bd888e82d52a93",
"metadata": {},
"metadata": {
"scrolled": true
},
"source": [
"# Restore the dumped collections to the \"transformer\" MongoDB server.\n",
"shell_command = f\"\"\"\n",
Expand Down Expand Up @@ -474,7 +491,9 @@
{
"cell_type": "code",
"id": "9c89c9dd3afe64e2",
"metadata": {},
"metadata": {
"scrolled": true
},
"source": [
"# Instantiate a MongoAdapter bound to the \"transformer\" database.\n",
"adapter = MongoAdapter(\n",
Expand Down Expand Up @@ -524,7 +543,7 @@
"for collection_name in ordered_collection_names:\n",
" collection = transformer_mongo_client[\"nmdc\"][collection_name]\n",
" num_documents_in_collection = collection.count_documents({})\n",
" print(f\"Validating collection {collection_name} ({num_documents_in_collection} documents)\")\n",
" print(f\"Validating collection {collection_name} ({num_documents_in_collection} documents)\", end=\"\\t\") # no newline\n",
"\n",
" for document in collection.find():\n",
" # Validate the transformed document.\n",
Expand All @@ -541,7 +560,9 @@
" #\n",
" document_without_underscore_id_key = {key: value for key, value in document.items() if key != \"_id\"}\n",
" root_to_validate = dict([(collection_name, [document_without_underscore_id_key])])\n",
" nmdc_jsonschema_validator.validate(root_to_validate) # raises exception if invalid"
" nmdc_jsonschema_validator.validate(root_to_validate) # raises exception if invalid\n",
"\n",
" print(f\"Done\")"
],
"outputs": [],
"execution_count": null
Expand All @@ -559,7 +580,9 @@
{
"cell_type": "code",
"id": "db6e432d",
"metadata": {},
"metadata": {
"scrolled": true
},
"source": [
"# Dump the database from the \"transformer\" MongoDB server.\n",
"shell_command = f\"\"\"\n",
Expand All @@ -583,7 +606,10 @@
"cell_type": "markdown",
"id": "997fcb281d9d3222",
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"### Create a bookkeeper\n",
Expand Down Expand Up @@ -664,7 +690,9 @@
{
"cell_type": "code",
"id": "1dfbcf0a",
"metadata": {},
"metadata": {
"scrolled": true
},
"source": [
"# Load the transformed collections into the origin server, replacing any same-named ones that are there.\n",
"shell_command = f\"\"\"\n",
Expand All @@ -691,7 +719,10 @@
"cell_type": "markdown",
"id": "ca5ee89a79148499",
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"### Indicate that the migration is complete\n",
Expand All @@ -703,7 +734,10 @@
"cell_type": "code",
"id": "d1eaa6c92789c4f3",
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"bookkeeper.record_migration_event(migrator=migrator, event=MigrationEvent.MIGRATION_COMPLETED)"
Expand Down Expand Up @@ -740,11 +774,19 @@
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "code",
"id": "037db214-ea76-46bf-bb6a-bf1ff9b28a72",
"metadata": {},
"source": [],
"outputs": [],
"execution_count": null
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand Down
Loading

0 comments on commit d4b06b8

Please sign in to comment.