-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update stats endpoint to return quantity of non-parent studies #1509
Comments
Here are two versions of the aforementioned query. One of them works with the 1/1/2024 dump and the other works with the 1/1/2025 dump. They are labeled accordingly. Get the number of studies that are not parent studies ("2025 version")-- Get the number of studies that are not parent studies ("2025 version").
--
-- Note: This query is compatible with the nightly Postgres dump from 1/1/2025,
-- where the `study` table has a `part_of` column. It is not compatible
-- with the nightly Postgres dump from 1/1/2024 (a year earlier),
-- where the `study` table _does not yet_ have a `part_of` column.
--
SELECT count(*) AS "Number of studies (non-parent only)"
FROM study
WHERE id NOT IN (
-- Get the distinct IDs (as text) of all the parent studies.
SELECT DISTINCT parent_study_id_as_text
FROM study,
jsonb_array_elements_text(part_of) AS parent_study_id_as_text
WHERE jsonb_typeof(part_of) = 'array'
); Get the number of studies that are not parent studies ("2024 version")-- Get the number of studies that are not parent studies ("2024 version").
--
-- Note: This query is compatible with the nightly Postgres dump from 1/1/2024,
-- where the `study` table did not yet have a `part_of` column. It is not
-- compatible with the nightly Postgres dump from 1/1/2025 (a year later),
-- where the `study` table _does_ have a `part_of` column.
--
SELECT count(*) AS "Number of studies (non-parent only)"
FROM study
WHERE id NOT IN (
-- Get the distinct IDs (as text) of all the parent studies.
SELECT DISTINCT parent_study_id_as_text
FROM study,
jsonb_array_elements_text(annotations->'part_of') AS parent_study_id_as_text
WHERE jsonb_typeof(annotations) = 'object'
AND jsonb_typeof(annotations->'part_of') = 'array'
); |
This is not assigned to anyone and hasn't been updated so I'm removing from sprint and adding to the backlog. @eecavanna if you need this to be prioritized please add to the agenda for next week's infra sync meeting. |
This is not assigned to anyone and hasn't been updated so I'm removing from sprint and adding to the backlog again. |
Oops! I forgot about this ticket. I will assign it to myself and implement it in the sprint after next. |
Here's an example API response from https://data.microbiomedata.org/api/stats:
The
studies
number currently accounts for all studies. It does not distinguish between so-called "parent" and "non-parent" studies, where the former are studies whoseid
values appear in thepart_of
column of a study, and the latter are studies whoseid
values do not appear in thepart_of
column of any studies.I discussed this distinguishing characteristic (i.e. whether a study is a parent study versus a non-parent study) with several team members this week, while trying to gather metrics for a report/infographic. @aclum wants this endpoint to return the number of "non-parent" studies.
My proposal is to add a new field to this object, named
non_parent_studies
, whose value is the quantity of non-parent studies. That is as opposed to repurposing the existingstudies
field.In the comments below, I will post a Postgres query that can be used to determine that quantity (off the top of my head, I don't know what it would look like as a SQLAlchemy statement).
The text was updated successfully, but these errors were encountered: