Preprocessing from rnaseq #8

pinin4fjords · 2023-12-22T16:59:37Z

I'm proposing bringing in the preprocessing from the main rna-seq workflow. It had a couple of issues for this workflow, which I have resolved:

Strandedness inference needs to go after trimming so that the associated Salmon call works
We need to reduce kmer size in the Salmon index to allow for the short riboseq trimmed reads

I have wrapped up that preprocessing in a subworkflow, which, after discussion, I will centralise to the nf-core modules repo.

Note that there is a lot of config carried over from rnaseq we may not use, and may well be removed before the first release. But I'd like to keep it in place until we resolve the next steps surrounding alignment and quantification, and see how much of it we'll reuse.

PR checklist

github-actions · 2023-12-22T17:01:09Z

`nf-core lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit aeffe21

+| ✅ 158 tests passed       |+
#| ❔   2 tests were ignored |#
!| ❗  19 tests had warnings |!

❗ Test warnings:

readme - README contains the placeholder zenodo.XXXXXXX. This should be replaced with the zenodo doi (after the first release).
pipeline_todos - TODO string in README.md: TODO nf-core:
pipeline_todos - TODO string in README.md: Include a figure that guides the user through the major workflow steps. Many nf-core
pipeline_todos - TODO string in README.md: Fill in short bullet-pointed list of the default steps in the pipeline
pipeline_todos - TODO string in README.md: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.
pipeline_todos - TODO string in README.md: update the following command to include all required parameters for a minimal example
pipeline_todos - TODO string in README.md: If applicable, make list of people who have also contributed
pipeline_todos - TODO string in README.md: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file.
pipeline_todos - TODO string in README.md: Add bibliography of tools and data used in your pipeline
pipeline_todos - TODO string in usage.md: Add documentation about anything specific to running your pipeline. For general topics, please point to (and add to) the main nf-core website.
pipeline_todos - TODO string in output.md: Write this documentation describing your workflow's output
pipeline_todos - TODO string in WorkflowMain.groovy: Add Zenodo DOI for pipeline after first release
pipeline_todos - TODO string in WorkflowRiboseq.groovy: Optionally add in-text citation tools to this list.
pipeline_todos - TODO string in base.config: Check the defaults for all processes
pipeline_todos - TODO string in base.config: Customise requirements for specific processes.
pipeline_todos - TODO string in test_full.config: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
pipeline_todos - TODO string in test_full.config: Give any required params for the test so that command line flags are not needed
pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required

❔ Tests ignored:

files_unchanged - File ignored due to lint config: .gitignore or .prettierignore or pyproject.toml
actions_ci - actions_ci

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-riboseq_logo_light.png
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/images/nf-core-riboseq_logo_light.png
files_exist - File found: docs/images/nf-core-riboseq_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: lib/nfcore_external_java_deps.jar
files_exist - File found: lib/NfcoreTemplate.groovy
files_exist - File found: lib/Utils.groovy
files_exist - File found: lib/WorkflowMain.groovy
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: .github/workflows/awstest.yml
files_exist - File found: .github/workflows/awsfulltest.yml
files_exist - File found: lib/WorkflowRiboseq.groovy
files_exist - File found: modules.json
files_exist - File found: pyproject.toml
files_exist - File not found check: Singularity
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: docs/images/nf-core-riboseq_logo.png
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: .travis.yml
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: params.validationShowHiddenParams
nextflow_config - Config variable found: params.validationSchemaIgnoreParams
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version ends in dev: 1.0dev
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
nextflow_config - nextflow.config contains configuration profile test
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/CONTRIBUTING.md matches the template
files_unchanged - .github/ISSUE_TEMPLATE/bug_report.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/PULL_REQUEST_TEMPLATE.md matches the template
files_unchanged - .github/workflows/branch.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - assets/email_template.html matches the template
files_unchanged - assets/email_template.txt matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - assets/nf-core-riboseq_logo_light.png matches the template
files_unchanged - docs/images/nf-core-riboseq_logo_light.png matches the template
files_unchanged - docs/images/nf-core-riboseq_logo_dark.png matches the template
files_unchanged - docs/README.md matches the template
files_unchanged - lib/nfcore_external_java_deps.jar matches the template
files_unchanged - lib/NfcoreTemplate.groovy matches the template
actions_awstest - '.github/workflows/awstest.yml' is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml does not use -profile test
readme - README Nextflow minimum version badge matched config. Badge: 23.04.0, Config: 23.04.0
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (233 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: linting_comment.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: awsfulltest.yml
actions_schema_validation - Workflow validation passed: release-announcements.yml
actions_schema_validation - Workflow validation passed: awstest.yml
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: release-announcments.yml
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: ci.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - 'assets/multiqc_config.yml' contains report_section_order
multiqc_config - 'assets/multiqc_config.yml' contains export_plots
multiqc_config - 'assets/multiqc_config.yml' contains report_comment
multiqc_config - 'assets/multiqc_config.yml' follows the ordering scheme of the minimally required plugins.
multiqc_config - 'assets/multiqc_config.yml' contains a matching 'report_comment'.
multiqc_config - 'assets/multiqc_config.yml' contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'

Run details

nf-core/tools version 2.11.1
Run at 2024-01-26 11:28:49

pinin4fjords

Just some pointers to the important bits - most of the changed files are just pulling in the nf-core components.

pinin4fjords · 2024-01-23T20:56:12Z

tests/nextflow.config

+params {
+    test_data_base = 'https://raw.githubusercontent.com/nf-core/test-datasets/modules'
+    modules_testdata_base_path = 's3://ngi-igenomes/testdata/nf-core/modules/'
+    hisat2_build_memory = '3.GB'


This makes hisat behave the same as in the tests on nf-core/modules

pinin4fjords · 2024-01-23T20:56:54Z

tests/nextflow.config

+
+    // Override modules.config so module snapshots match
+
+    withName: FQ_SUBSAMPLE {


These lines unset things to override workflow level config that was changing things wrt tests in nf-core/modules

pinin4fjords · 2024-01-23T21:10:06Z

subworkflows/local/preprocess_rnaseq/main.nf

@@ -0,0 +1,252 @@
+import groovy.json.JsonSlurper


This is the most important thing to review- a subworkflow comprising preprocessing from rnaseq.

Once reviewed and approved I'll PR it to modules so we can factor it out of both rnaseq and riboseq.

I would love that sooo much

adamrtalbot

I'm not a massive fan of the monolithic local subworkflow but I guess that's a problem with rnaseq.

adamrtalbot · 2024-01-26T08:55:27Z

bin/gtf2bed

There's a bedops/gtf2bed module PR open here but it's stalled out: nf-core/modules#4476

I think it's ok to merge as is, and we can then convert it to nf-test, but I'm unsure if the code would behave the same

adamrtalbot · 2024-01-26T09:04:23Z

modules/local/star_genomegenerate_igenomes/main.nf

Do we need this? It seems superfluous to the star_genomegenerate module.

Again, I'm deferring to the authority of rnaseq, and we should de-localise to share components (improving as we do so).

Aren't you the authority of rnaseq?

I mean 'good enough for rnaseq, good enough for this' - i.e. I want to be consistent but don't have time to fix it everywhere today :-)

adamrtalbot · 2024-01-26T09:07:12Z

subworkflows/local/prepare_genome/main.nf

hmmm a local shared subworkflow? Seems hard to keep in sync with rnaseq.

It's also huge! Maybe we should break it up a bit.

I'm trying not to re-engineer the rnaseq stuff, just re-use it. This should be de-localised and re-engineered as part of that process, but I don't want to derail progress on riboseq by doing that.

Fair enough

and hopefully we'll manage most of this with a proper nf-core/references

modules/local/gtf2bed/main.nf

Co-authored-by: Maxime U Garcia <[email protected]>

maxulysse · 2024-01-26T10:54:30Z

modules/local/cat_additional_fasta/main.nf

+    task.ext.when == null || task.ext.when
+
+    script:
+    def genome_name  = params.genome ? params.genome : fasta.getBaseName()


don't like the use of params.genome here

modules/local/gtf_filter/main.nf

nextflow_schema.json

maxulysse

minor comments, looks good to me, I do love the rna preprocess subworkflow that could be pushed to the modules repo.
I think the only blocker for that is for now the location of config files.
I'd say all that is related to that subworkflow should sit with that subworkflow

Co-authored-by: Maxime U Garcia <[email protected]>

pinin4fjords · 2024-01-29T15:38:28Z

Thanks @maxulysse @adamrtalbot

pinin4fjords added 12 commits December 20, 2023 16:08

Start adding preprocessing components from rnaseq

a3db3ec

[skip ci] Unrestrict test resources while I'm mucking about

d3be5ed

Add gtf to schema

6d96c75

Reduce kmer size

c99d042

Trim before attempting strand detection

7131650

Add pass_trimmed_reads

a0f25d6

Fix lib calls

6c0f58e

Fix test profile

c8309d5

Update conf

0e243ee

Messy but working rnaseq-style proprocessing

601aed2

Encapsulated preprocessing

216ad1a

Borrow more config from rnaseq

2647cbc

pinin4fjords marked this pull request as draft December 22, 2023 17:01

pinin4fjords added 4 commits December 22, 2023 17:30

Reset test resources back

22922dd

Merge branch 'dev' into preprocessing_from_rnaseq

8c7b0dc

Add test_data_base default

5933b28

Fix multiqc config

58ce68c

pinin4fjords mentioned this pull request Jan 4, 2024

Too few assignation of fragments to transcripts in the index nf-core/rnaseq#1111

Closed

pinin4fjords and others added 11 commits January 11, 2024 15:28

Bump to fix subworkflow tests

9db1555

Bump sortmerna

81de6ed

Fix HISAT2 tests

81a01a7

Bump cat/fastq and update config to fix

a4cc7ed

Override workflow config for testing fq/subsample

5e33157

Bump fq/subsample

b560b6e

Set GFFREAD test args

0b963ec

Bump gffread

aae53f5

Totally suppress workflow ext.args for GFFREAD

d9c4795

Run prettier

08526e6

Arrange subworkflow better

086846e

pinin4fjords commented Jan 23, 2024

View reviewed changes

pinin4fjords changed the title ~~[WIP] Preprocessing from rnaseq~~ Preprocessing from rnaseq Jan 23, 2024

pinin4fjords marked this pull request as ready for review January 23, 2024 21:13

pinin4fjords added 2 commits January 23, 2024 21:16

Fix relative includes for new subworkflow location

8cffca8

Bump outdated modules

bc548a6

adamrtalbot reviewed Jan 26, 2024

View reviewed changes

maxulysse reviewed Jan 26, 2024

View reviewed changes

modules/local/gtf2bed/main.nf Outdated Show resolved Hide resolved

Update modules/local/gtf2bed/main.nf

4b88f15

Co-authored-by: Maxime U Garcia <[email protected]>

maxulysse reviewed Jan 26, 2024

View reviewed changes

modules/local/gtf_filter/main.nf Outdated Show resolved Hide resolved

maxulysse reviewed Jan 26, 2024

View reviewed changes

nextflow_schema.json Outdated Show resolved Hide resolved

maxulysse reviewed Jan 26, 2024

View reviewed changes

nextflow_schema.json Outdated Show resolved Hide resolved

maxulysse approved these changes Jan 26, 2024

View reviewed changes

Apply rnaseq -> riboseq fixes from code review

aeffe21

Co-authored-by: Maxime U Garcia <[email protected]>

adamrtalbot approved these changes Jan 26, 2024

View reviewed changes

pinin4fjords merged commit 4a0099f into dev Jan 29, 2024
53 checks passed

pinin4fjords deleted the preprocessing_from_rnaseq branch January 29, 2024 15:38

pinin4fjords linked an issue Feb 19, 2024 that may be closed by this pull request

Define preprocessing #13

Closed

pinin4fjords added this to the v1.0.0 milestone Feb 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preprocessing from rnaseq #8

Preprocessing from rnaseq #8

pinin4fjords commented Dec 22, 2023 •

edited

Loading

github-actions bot commented Dec 22, 2023 •

edited

Loading

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

pinin4fjords left a comment

pinin4fjords Jan 23, 2024

pinin4fjords Jan 23, 2024

pinin4fjords Jan 23, 2024

maxulysse Jan 26, 2024

adamrtalbot left a comment

adamrtalbot Jan 26, 2024

maxulysse Jan 26, 2024

adamrtalbot Jan 26, 2024

pinin4fjords Jan 26, 2024

maxulysse Jan 26, 2024

pinin4fjords Jan 29, 2024

adamrtalbot Jan 26, 2024

adamrtalbot Jan 26, 2024

pinin4fjords Jan 26, 2024

adamrtalbot Jan 26, 2024

maxulysse Jan 26, 2024

maxulysse Jan 26, 2024

maxulysse left a comment

pinin4fjords commented Jan 29, 2024


		// Override modules.config so module snapshots match

		withName: FQ_SUBSAMPLE {

Preprocessing from rnaseq #8

Preprocessing from rnaseq #8

Conversation

pinin4fjords commented Dec 22, 2023 • edited Loading

PR checklist

github-actions bot commented Dec 22, 2023 • edited Loading

nf-core lint overall result: Passed ✅ ⚠️

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

pinin4fjords left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamrtalbot left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maxulysse left a comment

Choose a reason for hiding this comment

pinin4fjords commented Jan 29, 2024

pinin4fjords commented Dec 22, 2023 •

edited

Loading

github-actions bot commented Dec 22, 2023 •

edited

Loading

`nf-core lint` overall result: Passed ✅ ⚠️