-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Endpoint /data_objects/study/{study_id}
does not return all expected data objects
#723
Comments
@PeopleMakeCulture and I pair-investigated this today. Here are my notes from that investigation (we did not fix the issue):
|
/data_objects/study/{study_id}
does not return all expected data objects
@eecavanna @PeopleMakeCulture there is confusion between a I AM able to find this record in alldocs in mongo prod with:
but this record is NOT being returned by the endpoint. I assigned this to @sujaypatil96 because I suspect the issue is with the code the endpoint uses to get records from alldocs, not an issue with alldocs itself and he wrote the code for that. |
Thanks for elaborating on the situation. I wasn't familiar with the term "Functional Annotation GFF," but thought it might have something to do with the Assigning this ticket to @sujaypatil96 because you suspect the issue is with that endpoint makes sense to me. For reference (by everyone)Here's a link to the endpoint's code (in Runtime v1.10.0, which is running in production): nmdc-runtime/nmdc_runtime/api/endpoints/find.py Lines 129 to 191 in d3742a5
|
The code currently doesn't account for DataObjects created by WorkflowExecution processes (using
|
Not all samples have LibraryPreparation. Some older records go directly from biosample to a DataGeneration subclass, in particular records that get mass spec don't use this class at all. This needs to be generated in an agnostic fashion that. It should use the collection of relationship slots (as derived from the schema) to figure out what slots to query. |
Oh yes, you're right, sorry, that was a false statement that I made above (scratched it out), don't know what I was thinking. But the code is actually agnostic and doesn't "hardcode" any classes per se. It should work for the "Some older records go directly from biosample to a DataGeneration subclass" case. The logic works in the following manner:
|
Then I don't understand why this isn't working for this study. Please dig in further. |
unless it is stopping at the first DataObject it finds instead of continuing to check relationships. |
Moving to next sprint for more "digging" |
Describe the bug
This is not returning all expected data objects. It appears to only be returning the raw data. FWIW this is incorrect in the berkeley environment as well.
Please check on what is being used to connect workflow records to omics processing, if this is using part_of instead of was_informed_by on the WorkflowExecution subclass records that could explain what is happening. This shouldn't be used and doesn't exist in berkeley.
To Reproduce
Steps to reproduce the behavior:
'https://api.microbiomedata.org/data_objects/study/nmdc%3Asty-11-5tgfr349'
-H 'accept: application/json'
nmdc:dobj-11-10kp6g46
, an expected data object from MetagenomeAnnotation which can be found in alldocs when I manually search for it.Expected behavior
Several thousand additional files should be returned. For example 219 Functional annotation gff files are expected. You can search this study in the data portal to see what the expect results are.
Screenshots
If applicable, add screenshots to help explain your problem.
Acceptance Criteria
Example scenario-based template:
Given (some given context or precondition), when (I take this action), then (this will be the specific outcome).
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: