Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source MetaProteomics functional results from functional_annotation_agg collection for ingest and Data Portal #1468

Closed
kheal opened this issue Nov 25, 2024 · 11 comments · Fixed by #1499
Assignees

Comments

@kheal
Copy link

kheal commented Nov 25, 2024

After restructuring of the functional_annotation_agg collection in Mongo we will be and adding MetaProteomics functional results to this collection (dependent on microbiomedata/nmdc-aggregator#27). We will then need to restructure/check how the MetaP functional annotations are accessed during ingest to postgres and ultimately on the Data Portal.

We intend to remove all records from the metap_gene_function_aggregation collection in Mongo - instead all functional annotations for Metaproteomics will be found in the functional_annotation_agg collection.

For testing - in dev mongo, the functional_annotation_agg collection now contains records for a MetaProteomics analysis. You can find them by searching dev mongo's functional_annotation_agg collection for the was_generated_by field's value "nmdc:wfmp-11-x0zhd078.1".

cc @aclum

@aclum
Copy link
Contributor

aclum commented Nov 26, 2024

@naglepuff we'd like this as part of the December release.

@naglepuff
Copy link
Collaborator

I could get a draft PR up within the next day or 2 (say by 12/4).

After that I'll want to coordinate with @eecavanna to figure out when dev data gets migrated, then the PR for this issue can be merged into dev for testing.

@eecavanna
Copy link
Collaborator

eecavanna commented Dec 2, 2024

Hi @kheal, I talked with @aclum during today's release planning meeting and she elaborated on this ticket. I have a question regarding this statement:

We intend to remove all records from the metap_gene_function_aggregation collection in Mongo - instead all functional annotations for Metaproteomics will be found in the functional_annotation_agg collection.

What (if anything) do you expect to happen to the data that's currently in the metap_gene_function_aggregation collection? For example, will there be a Mongo migrator that moves it to the functional_annotation_agg collection? Or will we ignore it (maybe delete it later) and allow the aggregator to re-generate that data "in the right place" (i.e. in the functional_annotation_agg collection)? I think the latter will be simpler for me, as the migration framework has not been used with non-schema-described collections before (I'm under the impression the metap_gene_function_aggregation is not described by the schema).

@kheal
Copy link
Author

kheal commented Dec 2, 2024

@eecavanna. Your second suggestion is currently our plan. No migration, just let the aggregator populate. Once we know the aggregator is working properly with the data portal, delete the ‘metap_gene_function_aggregation‘ collection entirely.

@naglepuff
Copy link
Collaborator

naglepuff commented Dec 3, 2024

Looks like I had some sort of misunderstanding regarding the change last month in the functional_annotation_agg collection and stopped pulling from the metap_gene_function_aggregation collection. So the functionality requested by this issue is already done and live in production. My bad.

Since this means the November release of nmdc-server is bugged, I can issue a hotfix that undoes the change from this commit:
9aa5e09 (#1433). This hotfix would NOT be merged into main due to the removal of metap_gene_function_aggregation for the December release.

@aclum I think the hotfix route here is low risk. Would you like me to implement it?

@aclum
Copy link
Contributor

aclum commented Dec 3, 2024

yes please

@kheal
Copy link
Author

kheal commented Dec 4, 2024

@naglepuff - you'll need to fix more than just that commit for a hotfix. From that same PR the was_generated_by change is not applicable to the metap_gene_function_aggregation collection for the November release (but will be for the December release).

Screenshot 2024-12-04 at 10 27 56 AM

@naglepuff
Copy link
Collaborator

You're right, here's the full changelog for the hotfix: v1.1.0...v1.1.1

@kheal
Copy link
Author

kheal commented Dec 20, 2024

@naglepuff - This is now in production, correct? If so, please close this ticket.

@naglepuff
Copy link
Collaborator

There's currently a bug, so I don't want to close this issue until #1499 is merged (the change in that PR will also be released as a hotfix)

That being said, neither dev nor prod use the metap_gene_function_aggregation collection in mongo. That can be deleted.

@aclum
Copy link
Contributor

aclum commented Dec 21, 2024

confirmed fixed with this release. verified by https://data.microbiomedata.org/?q=Ch8IABAAGAUiFyJLRUdHLk9SVEhPTE9HWTpLMDIxMTEi
expected number of proteomics records based on mongo: 26
observed: 26

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants