-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(deposition)!: Allow groups to specify which BioProject (and BioS…
…ample) we upload their sequences into (#3650) * use project_id as project_table primary key * allow overwrite of bioproject accession with given value * update values.yaml - changes will also need to be made in pathoplexus * allow addition of already existing biosample * Add RUN_REF if insdcRawReadsAccession is in metadata * make code clearer * Require user-supplied project/sample accession actually exists --------- Co-authored-by: Cornelius Roemer <[email protected]>
- Loading branch information
1 parent
317aae4
commit d6402c9
Showing
13 changed files
with
308 additions
and
98 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -31,8 +31,8 @@ This script runs once daily as a kubernetes cronjob. It calls the Loculus backen | |
- data must be state "OPEN" for use | ||
- data must not already exist in ENA or be in the submission process, this means: | ||
- data was not submitted by the `config.ingest_pipeline_submitter` | ||
- data is not in the `ena-submission.submission_table` | ||
- as an extra check we discard all sequences with `ena-specific-metadata` fields | ||
- data is not in the `ena-submission.submission_table` (and data with a later version is also not in the table) | ||
- as an extra check we upload data with `ena-specific-metadata` fields in a separate file with a warning | ||
|
||
## Threads | ||
|
||
|
@@ -47,10 +47,14 @@ Download file in `github_url` every 30s. If data is not in submission table alre | |
In a loop: | ||
|
||
- Get sequences in `submission_table` in state READY_TO_SUBMIT | ||
- if (bioprojectAccession exists in the metadata): | ||
- if (for (group_id, organism) a project entry with this accession already exists use that project_id) | ||
and update submission_table to SUBMITTED_PROJECT (add center_name and project_id) | ||
- else: create an entry in `project_table` and then update `submission_table` with results (center_name and project_id) | ||
- if (there exists an entry in the project_table for the corresponding (group_id, organism)): | ||
- if (entry is in status SUBMITTED): update `submission_table` to SUBMITTED_PROJECT. | ||
- else: update submission_table to SUBMITTING_PROJECT. | ||
- else: create project entry in `project_table` for (group_id, organism). | ||
- else: create project entry in `project_table` for (group_id, organism) -> creates a unique `project_id`. | ||
- Get sequences in `submission_table` in state SUBMITTING_PROJECT | ||
- if (corresponding `project_table` entry is in state SUBMITTED): update entries to state SUBMITTED_PROJECT. | ||
- Get sequences in `project_table` in state READY, prepare submission object, set status to SUBMITTING | ||
|
@@ -69,6 +73,10 @@ Maps loculus metadata to ena metadata using template: https://www.ebi.ac.uk/ena/ | |
In a loop | ||
|
||
- Get sequences in `submission_table` in state SUBMITTED_PROJECT | ||
- if (biosampleAccession exists in the metadata): | ||
- if (for (accession, version) entry with this accession already exists use that sample_id) | ||
and update submission_table to SUBMITTED_SAMPLE | ||
- else: create an entry in `sample_table` and then update `submission_table` with results | ||
- if (there exists an entry in the `sample_table` for the corresponding (accession, version)): | ||
- if (entry is in status SUBMITTED): update `submission_table` to SUBMITTED_SAMPLE. | ||
- else: update submission_table to SUBMITTING_SAMPLE. | ||
|
@@ -205,7 +213,7 @@ pip install -e . | |
flyway -user=postgres -password=unsecure -url=jdbc:postgresql://127.0.0.1:5432/loculus -schemas=ena_deposition_schema -locations=filesystem:./flyway/sql migrate | ||
``` | ||
|
||
2. Submit data to the backend as test user (create group, submit and approve), e.g. using [example data](https://github.com/pathoplexus/example_data). (To test the full submission cycle with insdc accessions submit cchf example data with only 2 segments.) | ||
2. Submit data to the backend as test user (create group, submit and approve - create 2 groups if insdc_ingest_group has not be created), e.g. using [example data](https://github.com/pathoplexus/example_data). (To test the full submission cycle with insdc accessions submit cchf example data with only 2 segments.) | ||
|
||
```sh | ||
KEYCLOAK_TOKEN_URL="http://localhost:8083/realms/loculus/protocol/openid-connect/token" | ||
|
@@ -230,7 +238,7 @@ curl -X 'POST' 'http://localhost:8079/groups' \ | |
}, | ||
"contactEmail": "[email protected]"}' | ||
LOCULUS_ACCESSION=$(curl -X 'POST' \ | ||
'http://localhost:8079/cchf/submit?groupId=1&dataUseTermsType=OPEN' \ | ||
'http://localhost:8079/cchf/submit?groupId=2&dataUseTermsType=OPEN' \ | ||
-H 'accept: application/json' \ | ||
-H "Authorization: Bearer ${JWT}" \ | ||
-H 'Content-Type: multipart/form-data' \ | ||
|
@@ -243,9 +251,10 @@ curl -X 'POST' 'http://localhost:8079/cchf/approve-processed-data' \ | |
-d '{"scope": "ALL"}' | ||
``` | ||
|
||
3. Get list of sequences ready to submit to ENA, locally this will write `results/ena_submission_list.json`. | ||
3. Get list of sequences ready to submit to ENA, locally this will write `results/ena_submission_list.json`, you need to copy the config produced by `../generate_local_test_config.sh`. | ||
|
||
```sh | ||
cp ../website/tests/config/ena-submission-config.yaml config/config.yaml | ||
python scripts/get_ena_submission_list.py --config-file=config/config.yaml --output-file=results/ena_submission_list.json | ||
``` | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
|
||
ALTER TABLE project_table DROP CONSTRAINT project_table_pkey; | ||
ALTER TABLE project_table ADD COLUMN project_id BIGSERIAL PRIMARY KEY; | ||
|
||
ALTER TABLE submission_table ADD project_id text; | ||
|
||
CREATE INDEX idx_project_table_group_id ON project_table(group_id); | ||
CREATE INDEX idx_project_table_organism ON project_table(organism); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.