Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update FunctionalAnnotationAggMember class for compatibility with MetaP Aggregation tables and implement migrator #2203

Merged
merged 28 commits into from
Nov 8, 2024

Conversation

kheal
Copy link
Contributor

@kheal kheal commented Oct 8, 2024

This was moved from the berkeley fork to this repo. Original PR found here: microbiomedata#242

Description

This PR will

  1. Modify the existing FunctionalAnnotationAggMember class to be able to accomodate both annotations from metagenomics and metaproteomics analyses. This is accomplished by exchanging the slot metagenome_anlaysis_id with the existing slot was_generated_by.
  2. Implement a migrator to adjust existing records.
  3. Deprecate the metagenome_anlaysis_id slot as there are now no classes using this slot.
  4. Update tests accordingly
  5. Add description to count slot
  6. Add a unique_keys to FunctionalAnnotationAggMember class that is a combination of was_generated_byandgene_function_id See Add unique_keys to FunctionalAnnotationAggMember class #2155 for why this was removed from this PR.

This PR will close 1253

PR Information

What type of PR is this? (check all applicable)

  • Schema change: Structure and content
    • deleted a slot

Related Issues

Did you add/update any tests?

  • Yes
    • All example yamls are updated (including invalid examples
    • Doctest for migrator added
    • Added invalid example for FunctionalAnnotationAggMember to check for rule on count slot

Could this schema change make it so any valid data becomes invalid?

  • Yes (A migrator is required)

If you answered "Yes", does this PR branch include that migrator?

  • Yes

Does this PR have any downstream implications?

Copy link

github-actions bot commented Oct 8, 2024

PR Preview Action v1.4.8
🚀 Deployed preview to https://microbiomedata.github.io/nmdc-schema/pr-preview/pr-2203/
on branch gh-pages at 2024-11-08 17:28 UTC

@kheal kheal marked this pull request as draft October 8, 2024 23:00
@kheal kheal changed the title Test 1253 metap aggtable [POST-berkeley-MERGER] Update FunctionalAnnotationAggMember class for compatibility with MetaP Aggregation tables Oct 8, 2024
@kheal kheal self-assigned this Oct 8, 2024
@kheal kheal linked an issue Oct 9, 2024 that may be closed by this pull request
@shreddd
Copy link
Contributor

shreddd commented Oct 9, 2024

Prior to merging / deploying this, I wanted to make sure that the needed changes also get rolled into
https://github.com/microbiomedata/nmdc-aggregator/
Otherwise this will break when a new collection is generated from the aggregator.

Possibly related? microbiomedata/nmdc-aggregator#24

@kheal
Copy link
Contributor Author

kheal commented Oct 9, 2024

Prior to merging / deploying this, I wanted to make sure that the needed changes also get rolled into https://github.com/microbiomedata/nmdc-aggregator/ Otherwise this will break when a new collection is generated from the aggregator.

Possibly related? microbiomedata/nmdc-aggregator#24

@shreddd - Yes that is the companion issue to keep the aggregator in step with this change.

@kheal kheal changed the title [POST-berkeley-MERGER] Update FunctionalAnnotationAggMember class for compatibility with MetaP Aggregation tables Update FunctionalAnnotationAggMember class for compatibility with MetaP Aggregation tables and implement migrator Oct 16, 2024
@kheal
Copy link
Contributor Author

kheal commented Oct 29, 2024

@eecavanna or @brynnz22 - Do either of you have guidance for how the migrator should be named? It is currently nmdc_schema/migrators/migrator_from_XX_to_PR2203.py. I can either leave it as that or update, but otherwise I think this MR is about ready to be merged in.

@brynnz22
Copy link
Contributor

Maybe migrator_from_11_0_3_to_11_1_0. I'm not sure what the next version of the schema will be. @turbomam do you have an ideas what the next version of the schema will be after merging this in?

@eecavanna
Copy link
Collaborator

Sorry about the delay! In a "workshop" all day today and Wednesday.

I agree with @brynnz22's recommendation. The "from" version is the currently released schema version. The "to" version is still TBD. The value @brynnz22 suggested makes sense to me as a placeholder (sufficient for merging, in my opinion, given that we don't know what the next schema version will be).


The requirement of putting the "to" version in the migrator puts us in this chicken-and-egg situation. I do plan to change this part of the procedure "soon" ™.

@brynnz22
Copy link
Contributor

@kheal @eecavanna Since I am working on a migrator right now for the has_calibration slot, it sounds like this should be a partial migrator and same with the one I am working on. Because they will both probably feed into the same schema update. Does this seem right? My PR will be ready sometime today.

@kheal
Copy link
Contributor Author

kheal commented Oct 30, 2024

@brynnz22 that seems reasonable. #2204 also has a migrator that could be bundled into this if we decide to go down that route.

@kheal
Copy link
Contributor Author

kheal commented Nov 1, 2024

Updated the migrator to be a partial for the 11.1 release.

@kheal
Copy link
Contributor Author

kheal commented Nov 4, 2024

@turbomam - do you want to review this before we merge it in?

@turbomam
Copy link
Member

turbomam commented Nov 5, 2024

@turbomam - do you want to review this before we merge it in?

yes, thanks. looking now.

count:
range: integer
description: The number of sequences (for a metagenome or metatranscriptome) or spectra (for metaproteomics) associated with the specified function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to move that definition for count to slot_usage. The count slot could easily be used with different meanings in other class contexts.

@turbomam
Copy link
Member

turbomam commented Nov 5, 2024

@sierra-moxon and @turbomam Could you please check that I've implemented the use of unique_keys correctly?

@kheal If you have tests that show valid and invalid re-use of either of the identifying slots, then I approve for merging.

Update: I get the sense that illegally duplicated id combinations aren't tested yet.

Copy link
Member

@turbomam turbomam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a great PR with a few tweaks. Let me know if you think any of my requests will be difficult (or harmful!)

count:
range: integer
description: The number of sequences (for a metagenome or metatranscriptome) or spectra (for metaproteomics) associated with the specified function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to move that definition for count to slot_usage. The count slot could easily be used with different meanings in other class contexts.

@kheal
Copy link
Contributor Author

kheal commented Nov 5, 2024

@turbomam. Thanks for the input.

@turbomam or @sierra-moxon I could use some help with the test and/or the unique_keys slot. I added a test in the problem/invalid folder - it is passing validation but it shouldn't, so something is wrong, but I can't tell if it's the testing or the actual definition in the schema. I used the Isotope example here as guidance for how to use the unique_keys slot.

I've moved the description for the count slot to the class's slot_usage.

@kheal
Copy link
Contributor Author

kheal commented Nov 8, 2024

After consulting with @sierra-moxon, we've determined that uniqueness across objects is not supported in JSONSchema, and because linkml-validate uses the jsonschema serialization of the model to do validation, it is not checking the unique keys constraint. See #2155 for more discussion

Because of the lack of verifiability on the unique_keys constraint currently supported, I've removed that from this PR. Thank you @turbomam for encouraging adding a testing file for this where we uncovered this.

@kheal kheal merged commit 9c3917d into main Nov 8, 2024
3 checks passed
@kheal kheal deleted the 1253_metap_aggtable branch November 11, 2024 20:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Edit FunctionalAnnotationAggMember to work with MetaP data
7 participants