Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Studies should not simultaneously be in published and unpublished states #191

Open
jdhayhurst opened this issue Aug 27, 2020 · 18 comments
Open
Assignees

Comments

@jdhayhurst
Copy link
Collaborator

Curators can "unpublish" studies in order to edit them and then "republish" them. The housekeeping table stores values such as the date these event occurred as well as a boolean for "is published". Somehow it has ended up that some studies can have isPublished as True but also have catalogUnpublishDate != null. If these are both True the pruning process interprets it as "this study is unpublished".

Need to investigate why this happens and prevent future occurrences.

One current example of a study in this state is GCST90000461.

@buniello
Copy link
Contributor

buniello commented Sep 9, 2020

More studies in this status: GCST90000295, GCST90000294.
All studies are from author submissions, therefore it is critical that they are released to the public. They got blocked in between statuses when I unpublished them to add top associations -- and then tried to re-publish/
@jdhayhurst @ljwh2

@tudorgroza
Copy link
Collaborator

@buniello - are these part of the submissions we've worked on together last week?

@buniello
Copy link
Contributor

yes, @tudorgroza they are. But I don't believe the problem is linked with the import step. I think there is a problem at the backend of the study view in curation app - as these studies got stuck in between statuses when i tried to re-publish them through the study view functionality https://www.ebi.ac.uk/gwas/curation/publication/32814899

@tudorgroza
Copy link
Collaborator

@sprintell - adding you to the mix :)
I've tried to get to the bottom of this but couldn't. Maybe a fresh pair of eyes will help ... i.e., yours :)

The code run by Annalisa to do the status update for the studies above is triggered in publication.html via:

<a role="button" type="submit" id="assignStatus"
                        class="btn btn-primary btn-sm">Save Status
                </a>

which is actually executed in goci-publication-page.js via $('#assignStatus').click(function ()

The endpoint called is /status_update in PublicationController (updateSelectedStatuses).
After various steps - each study ends up being updated via updateStatus in StudyOperationService

Here's the interesting part: GCST90000295 and GCST90000294 mentioned above correspond to study IDs: 62304264 and 62304257. In the logs, we have the following sequence recorded:

2020-09-08 16:57:06.646  INFO 24045 --- [http-bio-8080-exec-937] u.a.e.s.g.c.s.StudyOperationsService     : Study 62304271 status updated
2020-09-08 17:15:30.137  INFO 24045 --- [http-bio-8080-exec-934] u.a.e.s.g.c.s.StudyOperationsService     : Study 62304271 status updated
2020-09-08 17:15:30.209  INFO 24045 --- [http-bio-8080-exec-934] u.a.e.s.g.c.s.StudyOperationsService     : Study 62304264 status updated
2020-09-08 17:15:30.262  INFO 24045 --- [http-bio-8080-exec-934] u.a.e.s.g.c.s.StudyOperationsService     : Study 62304257 status updated
2020-09-08 17:15:30.310  INFO 24045 --- [http-bio-8080-exec-934] u.a.e.s.g.c.s.StudyOperationsService     : Study 62304250 status updated
2020-09-08 17:15:30.357  INFO 24045 --- [http-bio-8080-exec-934] u.a.e.s.g.c.s.StudyOperationsService     : Study 62304243 status updated
2020-09-08 17:15:30.403  INFO 24045 --- [http-bio-8080-exec-934] u.a.e.s.g.c.s.StudyOperationsService     : Study 62304236 status updated
2020-09-08 17:15:30.451  INFO 24045 --- [http-bio-8080-exec-934] u.a.e.s.g.c.s.StudyOperationsService     : Study 62304229 status updated
2020-09-08 17:15:30.502  INFO 24045 --- [http-bio-8080-exec-934] u.a.e.s.g.c.s.StudyOperationsService     : Study 62304222 status updated
2020-09-08 17:15:30.556  INFO 24045 --- [http-bio-8080-exec-934] u.a.e.s.g.c.s.StudyOperationsService     : Study 62304215 status updated

If you look closely - those two studies are right in the middle. Everything above and below encountered no issues when having the status updated. Just these two.

For each study, the sequence of steps in updateStatus is:

1. set published to TRUE
2. set published data if it doesn't exist
3. create note about republishing and add in the DB
4. set unpublish date to NULL
5. set unpublish reason to NULL
6. save housekeeping
7. log status change

These 2 failed studies seem to skip steps 4 and 5 above. Everything else is there ... the published flag set to TRUE, the note ... and the log status change (which is practically the line you see in the listing above).

Unfortunately ... I just can't understand why those 2 fields are not set to NULL appropriately.

@sprintell
Copy link
Member

Hi @tudorgroza, I'm happy to take a look tomorrow

@jdhayhurst
Copy link
Collaborator Author

In the meantime I've set the values for unpublished reason and unpublished date to NULL in curation db for these studies: GCST90000295, GCST90000294, GCST90000461. This should have been what steps 4 and 5 above did.

@tudorgroza
Copy link
Collaborator

Thank you @jdhayhurst

@sprintell
Copy link
Member

@tudorgroza , @jdhayhurst from my finding, this publication still has one study with id 62304271 and GCST90000296 also still in that confused unpublished state , it was published on 07/09 and has since been unpublished since 08/09 while paradoxically its isPublished status is set as true

Screenshot 2020-09-11 at 14 13 46

@tudorgroza
Copy link
Collaborator

That's the first one on the list ... very odd that Annalisa has not picked it up.

@sprintell
Copy link
Member

sprintell commented Sep 11, 2020

ok, @jdhayhurst please don't change the state please, lets leave it as it is for now, I need to use it as guinea-pig in my experiment to know what is wrong

@sprintell
Copy link
Member

sprintell commented Sep 11, 2020

@tudorgroza , @JalMacArthur , @buniello

FROM MY FINDINGS: This problem was created by too many flags been monitored unnecessarily e.g isPublished, UnpublishDate, unPublishReason, currentCurationStatus, newStatus creating Code smell as they are all checking thesame thing (whether the study is published or not) in different places, making it problematic to move studies from one state to another, while they are good to be stored in the database but not all of them should be used as condition to determine what happens to a study

In this situation:

1.) When curator select "Publish Study" from drop down and click on "Save Status" button for this publication (https://www.ebi.ac.uk/gwas/curation/publication/32814899) on the user interface, there is nothing happening in the database for any of the studies

In the back end when the "assignStudyStatus" method is called on this line of code

All the studies to be updated fails the condition (newStatus != currentStudyStatus)

Why ? Because their currentStudyStatus is "Publish Study" and the newStatus is also "Publish Study" as selected by the curator from the drop down.

2.) Secondly, when this happens, the else statements here:

sets the message report to be message = "Current status and new status are the same, no change required";
The message is returned to the calling controller which initiated the back end operation on this line :

but funnily, the returned error string is never reported to the curator, but rather the "studyIds" that the curator sent initially was just aggregated into a map with a status "updated" as shown here:

and returned back to the front end as it is, hence the reason why a user interface pop up keeps saying all the studies are updated even though nothing happened .

3.) If Annalisa decides to change the curation status to something else other than "Publish Study" on a 1st pass, it will definitely escape that initial problem, but another conditional check waits in final step that does the database UPDATE ON THIS LINE:

This condition check shows that for any study that has its curationStatus as "Publish Study"
If it's Housekeeping isPublished status is "true", its UnpublishDate and UnpublishReason can never be changed i.e Step 4 and step 5 listed by Tudor will never be implemented.

I think this last conditional check is unnecessary and when removed. the update should go through.

@buniello
Copy link
Contributor

thank you, @sprintell

@tudorgroza tudorgroza removed their assignment Sep 17, 2020
@sprintell
Copy link
Member

sprintell commented Sep 18, 2020

A couple of known and unknown ORACLE TRIGGERS scripts controlling the houseKeeping table, interferring and overriding developer's solution to this problem, @jhayhurst will manually change the necessary attributes for the study with id: 62304271 as was done for the others, since its the only one left in that unwanted state. to avoid spending too much developer's time on it

@jdhayhurst
Copy link
Collaborator Author

jdhayhurst commented Sep 23, 2020

The two statements below yield exactly the same id's, yet the pruning step performs (2), which is more opaque. So it would seem that we can modify the pruning script to prune those that satisfy (1). This will be easier to read and allow us to circumvent the problem where some studies have an unpublished date even though they are published.

// 1. if is NOT published
select S.id from study S, housekeeping HK
where S.HOUSEKEEPING_ID = HK.ID
AND hk.is_published != 1; 

// 2. if unpublished date NOT null or published date IS null 
select s.id from study S, housekeeping HK
where S.HOUSEKEEPING_ID = HK.ID
and (hk.catalog_unpublish_date is not NULL
or hk.catalog_publish_date is NULL); 

@sprintell and @ljwh2 - I'll let you decide whether you want the script to be modified - it should be easy, but we may want to consider why it was written in that way in the first place (unforeseen consequences)?

@jdhayhurst
Copy link
Collaborator Author

@sprintell
To identify studies in the db that are in both published and unpublished states:

select * from study S, housekeeping HK
where S.HOUSEKEEPING_ID = HK.ID
AND hk.is_published = 1 
AND hk.catalog_unpublish_date is not NULL;

To correctly assign these studies as 'published':

update housekeeping
set housekeeping.unpublish_reason_id = NULL, housekeeping.catalog_unpublish_date = NULL
where id in (
select HK.id from study S, housekeeping HK
where S.HOUSEKEEPING_ID = HK.ID
AND hk.is_published = 1 
AND hk.catalog_unpublish_date is not NULL);

@buniello
Copy link
Contributor

buniello commented Jan 12, 2021

found another study falling into this group: GCST90012102 @jdhayhurst @sprintell
this one cannot be published. blocked in the "unpublished" status.

@jdhayhurst
Copy link
Collaborator Author

jdhayhurst commented Jan 12, 2021

For clarity, GCST90012102 is different in that it appears in the database as 'unpublished' - it's not in both states. However, it may be a victim of a related issue because it is not possible to publish this study from the curation app. When attempted, it appeared as though it was published from the 'publication view' but on the study view it was unpublished. In the housekeeping table in the database it always to be unpublished. Had to manually set this study to published:

update housekeeping
set housekeeping.unpublish_reason_id = NULL, 
    housekeeping.catalog_unpublish_date = NULL, 
    housekeeping.is_published = 1
where housekeeping.id = <housekeepingid>;

@jdhayhurst
Copy link
Collaborator Author

I don't know if this is still an issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants