Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

community curation response rate #651

Closed
ValWood opened this issue Nov 15, 2017 · 41 comments
Closed

community curation response rate #651

ValWood opened this issue Nov 15, 2017 · 41 comments

Comments

@ValWood
Copy link
Member

ValWood commented Nov 15, 2017

In the stats, we report the response rate as a percentage (currently around 42%). It goes up, but very slowly. It would be nice to have a cumulative graph showing the growth over time eventually (the only way is up)

@ValWood
Copy link
Member Author

ValWood commented Jan 29, 2018

I mentioned today in the group meeting that this had gone up to 43.6% recently last week... I think it's statistically significant because
a) It's a complete dataset , not a sample, in which case you don't require statistics to explain the increase?
and
b) the number is just the ratio of curated vs. non curated out of the sessions sent out? and it is continually increasing...

If we plotted the response rate I'm sure it is a continuously upward trajectory... which is basically what we are interested in...I want to get to 50% this year...

@kimrutherford is it easy to include this as a graph in the stats? It would be much nicer than the number. It's not urgent but it might be a nice quick task if you want something "alternative" to the big browser elephant....

Does that all make sense?

@kimrutherford
Copy link
Member

is it easy to include this as a graph in the stats?

All the data is available so it wouldn't be too hard.

There are some edge cases to think about. Like this session which was sent out twice, in different years:
https://curation.pombase.org/pombe/view/object/curs/4315?model=track
should that count towards 2016 or 2017?

@ValWood
Copy link
Member Author

ValWood commented Jan 30, 2018

I envisaged that we would just use the ratio of the ones which are sent out vs. the one sent back.

So, the numbers

To date 1361 publications have been assigned to community members for curation. 597 are finished and are either in the main PomBase database or are currently being checked by the PomBase curators. That's a response rate of 43.8%.

so its always the first date sent out (things which are sent out multiple times are just reminders).

I envisage that the graph will look like this:

20180130_145240_resized

i.e goes up continually but very slowly.

I'm keep it going up by sending out enough reminders to sustain an increase. I don't send out too many at once as we would be swamped...

Eventually it will plateau when we are just left with the people who will never do any. We are a long way from that yet.... I'm still getting lots of "sorry I will do it" and a good uptake when I send reminders, even for old sessions...

@ValWood
Copy link
Member Author

ValWood commented Jan 30, 2018

y axis is %

@ValWood
Copy link
Member Author

ValWood commented Jan 30, 2018

I might be wrong because I don't know what the graph would look like at the start when the number of session was low! Actually I think it may begin at about 30%. Certainly for the past few years it has been going up slowly (this is partially due to the fact that the uptake on new papers is usually more immediate, it's old ones that are stagnating....)

@ValWood
Copy link
Member Author

ValWood commented Feb 3, 2018

44.1%. .....we will get to 50% by the end of the year I'm sure.....

@ValWood
Copy link
Member Author

ValWood commented Feb 10, 2018

44.3%.....

@ValWood
Copy link
Member Author

ValWood commented Feb 10, 2018

It was 32% when I did this presentation:
https://www.slideshare.net/ValerieWood/community-curation-at-pombase
(I cant remember when, I think it was about 18 months ago)

@ValWood

This comment has been minimized.

@ValWood

This comment has been minimized.

@kimrutherford

This comment has been minimized.

@ValWood ValWood removed the discuss label Apr 20, 2018
@ValWood ValWood changed the title response rate community curation response rate Feb 12, 2019
@ValWood
Copy link
Member Author

ValWood commented Apr 11, 2019

Will keep this open, would nice to see the cumulative increase on the stats page:
https://curation.pombase.org/pombe/stats/annotation

@kimrutherford
Copy link
Member

It would be nice to have a cumulative graph showing the growth over time eventually (the only way is up)

Is that true? If you sent out a bunch of sessions won't the response rate (temporarily) drop?

@ValWood
Copy link
Member Author

ValWood commented Feb 27, 2020

the drop is usually less than a fraction of % point so it won't show in the plot.

Screenshot 2020-02-27 at 10 28 53

if it ever dropped I would send out more reminders ;)

@ValWood
Copy link
Member Author

ValWood commented Feb 27, 2020

actually, that isn't the response rate graph, its the other one (2B), they look similar.

I would upload it but i need to swap laptops and mail it to myself because I can't upload to github on the other laptop.
I really need to sort my environment!

@kimrutherford
Copy link
Member

I've done some querying in Chado. I think the numbers don't match up with the 50% response rate shown in Canto because not all of the publications in Canto are exported to Chado. There are community sessions triaged as "Erratum" and "Wrong organism" for example which aren't exported.

I've made a new report "uncuratable publications with a community session" to help work this out:
https://curation.pombase.org/pombe/view/list/uncuratable_publications_with_a_community_session?model=track

Is a session is approved, the Canto details are exported to Chado regardless of the triage status.

This publication is an Erratum, but has an approved session:
https://curation.pombase.org/pombe/view/object/pub/11918?model=track

Here are the numbers from Chado:

 year | submitted | sent_sessions | response_rate 
------+-----------+---------------+---------------
 2013 |        91 |           927 |           9.8
 2014 |       174 |          1055 |          16.4
 2015 |       260 |          1171 |          22.2
 2016 |       403 |          1280 |          31.4
 2017 |       502 |          1378 |          36.4
 2018 |       641 |          1475 |          43.4
 2019 |       771 |          1579 |          48.8
 2020 |       800 |          1593 |          50.2

Note to self, query with:

WITH counts as (SELECT year,

  (SELECT COUNT (*)
   FROM pombase_publication_curation_summary
   WHERE canto_curator_role = 'community'
   AND (canto_annotation_status = 'NEEDS_APPROVAL' OR canto_annotation_status = 'APPROVAL_IN_PROGRESS' OR canto_annotation_status = 'APPROVED')
     AND (canto_session_submitted_date IS NOT NULL
          AND canto_session_submitted_date <= (YEAR || '-12-30')::date)) AS submitted,

  (SELECT COUNT (*)
   FROM pombase_publication_curation_summary
   WHERE canto_curator_role = 'community'
   AND (canto_approved_date is not null OR canto_first_sent_to_curator_year IS NOT NULL
     AND canto_first_sent_to_curator_year <= YEAR)) AS sent_sessions

FROM generate_series(2013,
                       (SELECT extract(YEAR
                                       FROM CURRENT_DATE))::integer) AS YEAR)
SELECT year, submitted, sent_sessions, trunc(100.0*submitted/sent_sessions,1) as response_rate from counts;

@ValWood
Copy link
Member Author

ValWood commented Feb 27, 2020

Ah OK.

PMID:31579888 is the one which had 2 PMIDs. This ID will be deleted.

Some are methods papers. Occasionally people get annotations from methods papers. We want to class these as "methods" & "curated"

One day we need to sort the classification so the "publication type" and " curation status" are separate

@ValWood

This comment has been minimized.

@ValWood

This comment has been minimized.

@kimrutherford

This comment has been minimized.

@ValWood ValWood added the future label Nov 27, 2020
@ValWood
Copy link
Member Author

ValWood commented Nov 29, 2023

53.9% still increasing
It seems that this is largely done, so a graph could be added to this page:
https://curation.pombase.org/pombe/stats/annotation

@kimrutherford
Copy link
Member

Latest query result:

 year | submitted | sent_sessions | response_rate 
------+-----------+---------------+---------------
 2013 |        88 |          1233 |           7.1
 2014 |       169 |          1330 |          12.7
 2015 |       253 |          1430 |          17.6
 2016 |       392 |          1513 |          25.9
 2017 |       483 |          1593 |          30.3
 2018 |       615 |          1673 |          36.7
 2019 |       740 |          1748 |          42.3
 2020 |       862 |          1828 |          47.1
 2021 |       982 |          1929 |          50.9
 2022 |      1050 |          1990 |          52.7
 2023 |      1132 |          2072 |          54.6
 2024 |      1136 |          2083 |          54.5

kimrutherford added a commit to pombase/pombase-chado-json that referenced this issue Feb 11, 2024
@kimrutherford
Copy link
Member

I had the query wrong and it was making a mess of the older sessions.

 year | submitted | sent_sessions | response_rate 
------+-----------+---------------+---------------
 2013 |        88 |           284 |          30.9
 2014 |       169 |           481 |          35.1
 2015 |       253 |           693 |          36.5
 2016 |       392 |           896 |          43.7
 2017 |       483 |          1088 |          44.3
 2018 |       615 |          1272 |          48.3
 2019 |       740 |          1448 |          51.1
 2020 |       862 |          1624 |            53
 2021 |       982 |          1817 |            54
 2022 |      1050 |          1930 |          54.4
 2023 |      1132 |          2068 |          54.7
 2024 |      1136 |          2081 |          54.5

kimrutherford added a commit to pombase/pombase-chado-json that referenced this issue Feb 11, 2024
kimrutherford added a commit to pombase/pombase-python-web that referenced this issue Feb 11, 2024
kimrutherford added a commit to pombase/website that referenced this issue Feb 11, 2024
kimrutherford added a commit to pombase/pombase-chado-json that referenced this issue Feb 11, 2024
@kimrutherford
Copy link
Member

I've added a curation response rate graph. Hopefully it will be on the main site in the morning but I've just had to restart the load so we'll see.

In the meantime it available on my desktop version: https://desktop.kmr.nz/curation_stats

image

@kimrutherford
Copy link
Member

Hopefully it will be on the main site in the morning but I've just had to restart the load so we'll see.

The load finished after a few false starts. GitHub was returning errors when the load script trying to check for the latest Mondo.

https://pombase.org/curation_stats

I had the query wrong and it was making a mess of the older sessions.

I'm still not 100% sure I have it right so I plan to check it again tomorrow after a good sleep. :-)

@ValWood
Copy link
Member Author

ValWood commented Feb 12, 2024

Great! we are realt flatlining. I'tt get this going again when PAscal starts

Can we make the graph start earlier ? (2012)

Also the graph doesn't match the early years to this one (30% is high for 2013), is this definitely 1st submission, or 1st approval data?

Screenshot 2024-02-12 at 07 51 31

@kimrutherford
Copy link
Member

Can we make the graph start earlier ? (2012)

Unfortunately the date stamps needed from Canto only go back to mid 2013.

is this definitely 1st submission, or 1st approval data?

It's calculated using the submitted date. It does that so that it matches the Canto stats page which uses the number of submitted sessions.

@kimrutherford
Copy link
Member

I'm going to look at this again in the morning because I've just spotted another problem. Currently it counts submitted sessions up to a given year and then divides by sessions sent out up to the same year. But it's going to get this wrong for sessions that were submitted in a different year to the year they were sent out. There are quite a few of those. Whoops.

@kimrutherford
Copy link
Member

Should the years in the graph be the year sent out or the year submitted? Or year approved?

@ValWood
Copy link
Member Author

ValWood commented Feb 12, 2024

submitted I think (the gap between submission and 1st approval should be less than a week 90% of the time so these numbers should be very similar)

@ValWood

This comment was marked as outdated.

kimrutherford added a commit to pombase/pombase-chado-json that referenced this issue Feb 13, 2024
@kimrutherford
Copy link
Member

the numbers are definitely different from the curation paper graph

I think the graph from the paper might be wrong but let's have a chat about this on the next call.

I've double checked the query that generates the current graph and I think it's correct. But it could be that it's not asking the right question.
https://pombase.org/curation_stats

@kimrutherford
Copy link
Member

For Kim: find backup from December 2012 to add response rate for that year

kimrutherford added a commit to pombase/pombase-chado-json that referenced this issue Feb 13, 2024
@kimrutherford
Copy link
Member

find backup from December 2012 to add response rate for that year

After a bit of digging, the response rate for 2012 was 91.6%

There were 12 community sessions sent out and 11 were submitted. Did you send them to people you knew would respond?

 year | submitted_for_approval_count | sent_or_accepted_count | response_rate 
------+------------------------------+------------------------+---------------
 2012 |                           11 |                     12 |          91.6
 2013 |                           90 |                    280 |          32.1
 2014 |                          171 |                    480 |          35.6
 2015 |                          255 |                    695 |          36.6
 2016 |                          392 |                    899 |          43.6
 2017 |                          483 |                   1092 |          44.2
 2018 |                          616 |                   1276 |          48.2
 2019 |                          745 |                   1452 |          51.3
 2020 |                          869 |                   1628 |          53.3
 2021 |                          990 |                   1821 |          54.3
 2022 |                         1058 |                   1934 |          54.7
 2023 |                         1141 |                   2072 |            55
 2024 |                         1144 |                   2085 |          54.8

@ValWood
Copy link
Member Author

ValWood commented Feb 14, 2024

Yes, I think that was probably the pilot project sessions. I put them all through later as community curated (or we changed them to community curated), I don't quite remember.
Maybe we begin with 2013 when we started properly

kimrutherford added a commit to pombase/pombase-chado-json that referenced this issue Feb 14, 2024
Some sessions are approved before they are accepted which shouldn't be
possible, so use the oldest date.

Refs pombase/pombase-chado#651
kimrutherford added a commit to pombase/pombase-chado-json that referenced this issue Feb 14, 2024
@kimrutherford
Copy link
Member

I'll close this as it's getting long and I think it's done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants