-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
not all schema collections suffix with _set
s
#620
Conversation
interpret as simple insertion. leave note in code about decision to insist on schema-supplied uniqueness signal. fix #611
_set
s
_set
s_set
s
Co-authored-by: eecavanna <[email protected]>
* fix: allow "update" of non-`id`-having document collections interpret as simple insertion. leave note in code about decision to insist on schema-supplied uniqueness signal. fix #611 * refactor to add test * fix: rm abandoned candidate test * Update nmdc_runtime/site/ops.py Co-authored-by: eecavanna <[email protected]> --------- Co-authored-by: eecavanna <[email protected]>
We can use https://w3id.org/linkml/unique_keys to declare a compound key (not totally understanding the broader issue though) |
I think we should review how this collection is typically generated based on: This will need some additional discussion. We can follow up on slack or in a small meeting. |
PS - this collection's role is to provide the aggregation tables for the data portal search (it will not be typically queried directly from the outside by end users), so we should keep that in mind. One way to approach to this might be to model the set of mongo queries used in https://github.com/microbiomedata/nmdc-aggregator/blob/main/generate_functional_agg.py in the API. |
@eecavanna @pkalita-lbl this PR is already closed, can we get a new ticket and PR to address the new issue |
I'll create a new issue for it now. Thanks, @aclum. Thanks, @pkalita-lbl, for reporting the issue and linking to the relevant commit. |
Description
Just like not all heroes wear capes, not all not all schema collections suffix with
_set
s.Interpret
perform_mongo_updates
for non-id
-having schema collections as simple insertions. Leave note in code about decision to insist on schema-supplied uniqueness signal.This is a hack because e.g. https://w3id.org/nmdc/FunctionalAnnotationAggMember documents appear by eye to be unique via a compound key (metagenome_annotation_id, gene_function_id) and yet this is not explicit in the schema.
One potential solution is to auto-generate an
id
for such documents as a deterministic hash of the compound key, but any suchid
generation strategy should be unambiguous to a schema consumer such as nmdc-runtime.For now, decision is to potentially re-insert "duplicate" documents, i.e. to interpret lack of
id
as lack of unique document identity for de-duplication.Fixes #611
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
test_perform_mongo_updates_functional_annotation_agg
Definition of Done (DoD) Checklist:
black nmdc_runtime/
?)make up-test && make test-run
)