Skip to content

Commit

Permalink
Add details of ena-submission to README (taking from all previous PRs).
Browse files Browse the repository at this point in the history
  • Loading branch information
anna-parker committed Sep 13, 2024
1 parent ac1deb7 commit be659f1
Show file tree
Hide file tree
Showing 2 changed files with 97 additions and 1 deletion.
81 changes: 81 additions & 0 deletions ena-submission/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,84 @@
# ENA Submission

## Snakemake Rules

### get_ena_submission_list

This rule runs daily in a cron job, it calls the loculus backend (`get-released-data`), obtains a new list of sequences that are ready for submission to ENA and sends this list as a compressed json file to our slack channel. Sequences are ready for submission IF:

- data in state APPROVED_FOR_RELEASE:
- data must be state "OPEN" for use
- data must not already exist in ENA or be in the submission process, this means:
- data was not submitted by the `config.ingest_pipeline_submitter`
- data is not in the `ena-submission.submission_table`
- as an extra check we discard all sequences with `ena-specific-metadata` fields

### all

This rule runs in the ena-submission pod, it runs the following rules in parallel:

#### trigger_submission_to_ena

Download file in `github_url` every 30s. If data is not in submission table already (and not a revision) upload data to `ena-submission.submission_table`.

#### create_project

In a loop:

- Get sequences in `submission_table` in state READY_TO_SUBMIT
- if (there exists an entry in the project_table for the corresponding (group_id, organism)):
- if (entry is in status SUBMITTED): update `submission_table` to SUBMITTED_PROJECT.
- else: update submission_table to SUBMITTING_PROJECT.
- else: create project entry in `project_table` for (group_id, organism).
- Get sequences in `submission_table` in state SUBMITTING_PROJECT
- if (corresponding `project_table` entry is in state SUBMITTED): update entries to state SUBMITTED_PROJECT.
- Get sequences in `project_table` in state READY, prepare submission object, set status to SUBMITTING
- if (submission succeeds): set status to SUBMITTED and fill in results: the result of a successful submission is `bioproject_accession` and an ena-internal `ena_submission_accession`.
- else: set status to HAS_ERRORS and fill in errors
- Get sequences in `project_table` in state HAS_ERRORS for over 15min and sequences in status SUBMITTING for over 15min: send slack notification

#### create_sample

In a loop

- Get sequences in `submission_table` in state SUBMITTED_PROJECT
- if (there exists an entry in the `sample_table` for the corresponding (accession, version)):
- if (entry is in status SUBMITTED): update `submission_table` to SUBMITTED_SAMPLE.
- else: update submission_table to SUBMITTING_SAMPLE.
- else: create sample entry in `sample_table` for (accession, version).
- Get sequences in `submission_table` in state SUBMITTING_SAMPLE
- if (corresponding `sample_table` entry is in state SUBMITTED): update entries to state SUBMITTED_SAMPLE.
- Get sequences in `sample_table` in state READY, prepare submission object, set status to SUBMITTING
- if (submission succeeds): set status to SUBMITTED and fill in results, the results of a successful submission are an `sra_run_accession` (starting with ERS) , a `biosample_accession` (starting with SAM) and an ena-internal `ena_submission_accession`.
- else: set status to HAS_ERRORS and fill in errors
- Get sequences in `sample_table` in state HAS_ERRORS for over 15min and sequences in status SUBMITTING for over 15min: send a slack notification

#### create_assembly

In a loop:

- Get sequences in `submission_table` in state SUBMITTED_SAMPLE
- if (there exists an entry in the `assembly_table` for the corresponding (accession, version)):
- if (entry is in status SUBMITTED): update `assembly_table` to SUBMITTED_ASSEMBLY.
- else: update `assembly_table` to SUBMITTING_ASSEMBLY.
- else: create assembly entry in `assembly_table` for (accession, version).
- Get sequences in `submission_table` in state SUBMITTING_SAMPLE
- if (corresponding `assembly_table` entry is in state SUBMITTED): update entries to state SUBMITTED_ASSEMBLY.
- Get sequences in `assembly_table` in state READY, prepare files: we need chromosome_list, fasta files and a manifest file, set status to WAITING
- if (submission succeeds): set status to WAITING and fill in results: ena-internal `erz_accession`
- else: set status to HAS_ERRORS and fill in errors
- Get sequences in `assembly_table` in state WAITING, every 5minutes (to not overload ENA) check if ENA has processed the assemblies and assigned them `gca_accession`. If so update the table to status SUBMITTED and fill in results
- Get sequences in `assembly_table` in state HAS_ERRORS for over 15min and sequences in status SUBMITTING for over 15min, or in state WAITING for over 48hours: send slack notification

#### upload_to_loculus

- Get sequences in `submission_table` state SUBMITTED_ALL.
- Get the results of all the submissions (from all other tables)
- Create a POST request to the submit-external-metadata with the results in the expected format.
- if (successful): set sequences to state SENT_TO_LOCULUS
- else: set sequences to state HAS_ERRORS_EXT_METADATA_UPLOAD
- Get sequences in `submission_table` in state HAS_ERRORS_EXT_METADATA_UPLOAD for over 15min and sequences in status SUBMITTED_ALL for over 15min: send slack notification

## Developing Locally

### Database
Expand Down Expand Up @@ -68,6 +147,8 @@ wget -q "https://github.com/enasequence/webin-cli/releases/download/${WEBIN_CLI_

Then run snakemake using `snakemake` or `snakemake {rule}`.

## Testing

### Run tests

```sh
Expand Down
17 changes: 16 additions & 1 deletion ena-submission/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,15 @@ with open("results/config.yaml", "w") as f:
LOG_LEVEL = config.get("log_level", "INFO")


rule all:
input:
triggered="results/triggered",
project_created="results/project_created",
sample_created="results/sample_created",
assembly_created="results/assembly_created",
uploaded_external_metadata="results/uploaded_external_metadata",


rule get_ena_submission_list:
input:
script="scripts/get_ena_submission_list.py",
Expand All @@ -36,6 +45,7 @@ rule get_ena_submission_list:
--log-level {params.log_level} \
"""


rule trigger_submission_to_ena:
input:
script="scripts/trigger_submission_to_ena.py",
Expand All @@ -51,6 +61,7 @@ rule trigger_submission_to_ena:
--log-level {params.log_level} \
"""


rule trigger_submission_to_ena_from_file: # for testing
input:
script="scripts/trigger_submission_to_ena.py",
Expand All @@ -68,6 +79,7 @@ rule trigger_submission_to_ena_from_file: # for testing
--log-level {params.log_level} \
"""


rule create_project:
input:
script="scripts/create_project.py",
Expand All @@ -83,6 +95,7 @@ rule create_project:
--log-level {params.log_level} \
"""


rule create_sample:
input:
script="scripts/create_sample.py",
Expand All @@ -98,6 +111,7 @@ rule create_sample:
--log-level {params.log_level} \
"""


rule create_assembly:
input:
script="scripts/create_assembly.py",
Expand All @@ -113,6 +127,7 @@ rule create_assembly:
--log-level {params.log_level} \
"""


rule upload_to_loculus:
input:
script="scripts/upload_external_metadata_to_loculus.py",
Expand All @@ -126,4 +141,4 @@ rule upload_to_loculus:
python {input.script} \
--config-file {input.config} \
--log-level {params.log_level} \
"""
"""

0 comments on commit be659f1

Please sign in to comment.