Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update FunctionalAnnotationAggMember class for compatibility with MetaP Aggregation tables and implement migrator #2203

Merged
merged 28 commits into from
Nov 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
16fa4f2
Change metagenom_annotation_id slot to was_generated_by on Functional…
kheal Aug 28, 2024
7124ec7
Make was_generated_by requiured on FunctionalAnnotationAggMember thro…
kheal Aug 28, 2024
c1fc821
Add migrator for FunctionalAnnotationAggMember slot move
kheal Aug 28, 2024
d112ce6
Move count requirement from slot to rule
kheal Aug 28, 2024
6f4e151
Fix rule on FunctionalAnnotationAggMember
kheal Aug 28, 2024
38c708d
Add invalid example to test rule on FunctionalAnnotationAggMember
kheal Aug 28, 2024
e56667e
Remove count rule from FunctionalAnnotationAggMember, leaving require…
kheal Sep 3, 2024
ccffe4a
Update tests
kheal Sep 3, 2024
6d00c76
Merge branch 'main' into 1253_metap_aggtable
kheal Sep 3, 2024
737e626
Update metatranscriptome functional aggregation example
kheal Sep 3, 2024
2671984
Fix count slot requirement
kheal Sep 3, 2024
02f993a
Fix typo in nmdc.yaml
kheal Sep 5, 2024
cae9ef8
Merge main into 1253 branch
kheal Sep 11, 2024
eb083d9
Add unique_keys to FunctionalAnnotationAggMember
kheal Sep 11, 2024
6256e3d
Add unique_keys to FunctionalAnnotationAggMember
kheal Sep 11, 2024
dd98eb0
Merge main into 1253_metap_aggtable
kheal Oct 8, 2024
e3df332
Add description for count slot on FunctionalAnnotationAggMember
kheal Oct 8, 2024
feba348
Remove count description on class and rely on slot definition
kheal Oct 9, 2024
0e31b9d
Clean up functional aggregation migrator
kheal Oct 9, 2024
7d3cd2a
Update name of migrator for functional_annotation_agg slot name change
kheal Oct 30, 2024
b2a508b
Rename migrator for merge prep
kheal Nov 1, 2024
c51b55e
Merge main into 1253_metap_aggtable
kheal Nov 1, 2024
ec4e76a
Convert functional annotation agg to partial migrator
kheal Nov 1, 2024
f295a0e
Modify partial migrator init
kheal Nov 1, 2024
0f8de96
Move count description to slot_usage
kheal Nov 5, 2024
421d9fa
Add test for unique_keys for distribution
kheal Nov 5, 2024
b517c3b
Fix invalid functional_annotation_agg example
kheal Nov 5, 2024
3efe72d
Remove unique_keys from FunctionalAnnotationAggMember class
kheal Nov 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@

from nmdc_schema.migrators.migrator_base import MigratorBase
from nmdc_schema.migrators.partials.migrator_from_11_0_3_to_11_1_0 import (
migrator_from_11_0_3_to_11_1_0_part_1
migrator_from_11_0_3_to_11_1_0_part_1,
migrator_from_11_0_3_to_11_1_0_part_2
)

def get_migrator_classes() -> List[Type[MigratorBase]]:
Expand All @@ -22,4 +23,5 @@ def get_migrator_classes() -> List[Type[MigratorBase]]:

return [
migrator_from_11_0_3_to_11_1_0_part_1.Migrator,
migrator_from_11_0_3_to_11_1_0_part_2.Migrator
]
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
from nmdc_schema.migrators.migrator_base import MigratorBase


class Migrator(MigratorBase):
r"""Migrates a database between two schemas."""

_from_version = "11_0_3"
_to_version = "11_1_0"
# See PR2203

def upgrade(self):
r"""Migrates the database from conforming to the original schema, to conforming to the new schema."""

self.adapter.process_each_document(
"functional_annotation_agg", [self.move_metagenome_id_to_was_generated_by]
)

def move_metagenome_id_to_was_generated_by(self, fun_agg: dict) -> dict:
r"""
Updates the `FunctionalAnnotationAggMember` records so the value originally in its `metagenome_annotation_id` field
is stored in a new field named `was_generated_by`; and removes the `metagenome_annotation_id` field.

`metagenome_annotation_id` is required on these records and has the same value as `was_generated_by` in the new schema,
so no data is lost in this migration nor do we need to check for the existence of the field.

>>> m = Migrator()
>>> m.move_metagenome_id_to_was_generated_by({'metagenome_annotation_id': 'mgm123', 'count': 1})
{'count': 1, 'was_generated_by': 'mgm123'}

"""
fun_agg["was_generated_by"] = fun_agg.pop("metagenome_annotation_id")
return fun_agg
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# This example is invalid because the count field is and it is required.
type: nmdc:FunctionalAnnotationAggMember
was_generated_by: nmdc:wfmgan-99-123456.1
gene_function_id: KEGG.ORTHOLOGY:K00627
2 changes: 1 addition & 1 deletion src/data/valid/Database-functional_annotation_agg.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
functional_annotation_agg:
- metagenome_annotation_id: nmdc:wfmgan-99-123456.1
- was_generated_by: nmdc:wfmgan-99-123456.1
gene_function_id: KEGG.ORTHOLOGY:K00627
count: 120
type: nmdc:FunctionalAnnotationAggMember
2 changes: 1 addition & 1 deletion src/data/valid/Database-interleaved.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -426,7 +426,7 @@ functional_annotation_agg:
- gene_function_id: KEGG.ORTHOLOGY:K00627
count: 120
type: nmdc:FunctionalAnnotationAggMember
metagenome_annotation_id: nmdc:wfmgan-99-123456.1
was_generated_by: nmdc:wfmgan-99-123456.1
biosample_set:
- id: nmdc:bsm-99-isqhuW
type: nmdc:Biosample
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
metagenome_annotation_id: nmdc:wfmtan-99-123456.1
was_generated_by: nmdc:wfmtan-99-123456.1
gene_function_id: KEGG.ORTHOLOGY:K00627
count: 120
type: nmdc:FunctionalAnnotationAggMember
Expand Down
2 changes: 1 addition & 1 deletion src/data/valid/FunctionalAnnotationAggMember-minimal.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
metagenome_annotation_id: nmdc:wfmgan-99-123456.1
was_generated_by: nmdc:wfmgan-99-123456.1
gene_function_id: KEGG.ORTHOLOGY:K00627
count: 120
type: nmdc:FunctionalAnnotationAggMember
18 changes: 11 additions & 7 deletions src/schema/nmdc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -300,15 +300,18 @@ classes:
FunctionalAnnotationAggMember:
class_uri: nmdc:FunctionalAnnotationAggMember
slots:
- metagenome_annotation_id
- was_generated_by
- gene_function_id
- count
- type
slot_usage:
metagenome_annotation_id:
structured_pattern: # doesn't include act
syntax: "{id_nmdc_prefix}:(wfmgan|wfmtan)-{id_shoulder}-{id_blade}{id_version}$"
was_generated_by:
structured_pattern:
syntax: "{id_nmdc_prefix}:(wfmgan|wfmp|wfmtan)-{id_shoulder}-{id_blade}{id_version}$"
interpolated: true
required: true
count:
description: The number of sequences (for a metagenome or metatranscriptome) or spectra (for metaproteomics) associated with the specified function.

Database:
class_uri: nmdc:Database
Expand Down Expand Up @@ -1043,19 +1046,20 @@ slots:
range: WorkflowExecution
description: The identifier for the analysis activity that generated the functional annotation results, where the analysis activity is an instance of the/an appropriate subclass of WorkflowExecution
required: true
any_of:
- range: MetagenomeAnnotation
- range: MetatranscriptomeAnnotation
deprecated: "not used. 2024-10 https://github.com/microbiomedata/nmdc-schema/issues/1253"


gene_function_id:
range: uriorcurie
description: The identifier for the gene function.
examples:
- value: KEGG.ORTHOLOGY:K00627
required: true

count:
range: integer
required: true

functional_annotation_agg:
range: FunctionalAnnotationAggMember
multivalued: true
Expand Down
Loading