Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preprocessing from rnaseq #8

Merged
merged 31 commits into from
Jan 29, 2024
Merged

Preprocessing from rnaseq #8

merged 31 commits into from
Jan 29, 2024

Conversation

pinin4fjords
Copy link
Member

@pinin4fjords pinin4fjords commented Dec 22, 2023

I'm proposing bringing in the preprocessing from the main rna-seq workflow. It had a couple of issues for this workflow, which I have resolved:

  • Strandedness inference needs to go after trimming so that the associated Salmon call works
  • We need to reduce kmer size in the Salmon index to allow for the short riboseq trimmed reads

I have wrapped up that preprocessing in a subworkflow, which, after discussion, I will centralise to the nf-core modules repo.

Note that there is a lot of config carried over from rnaseq we may not use, and may well be removed before the first release. But I'd like to keep it in place until we resolve the next steps surrounding alignment and quantification, and see how much of it we'll reuse.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/riboseq branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Copy link

github-actions bot commented Dec 22, 2023

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit aeffe21

+| ✅ 158 tests passed       |+
#| ❔   2 tests were ignored |#
!| ❗  19 tests had warnings |!

❗ Test warnings:

  • readme - README contains the placeholder zenodo.XXXXXXX. This should be replaced with the zenodo doi (after the first release).
  • pipeline_todos - TODO string in README.md: TODO nf-core:
  • pipeline_todos - TODO string in README.md: Include a figure that guides the user through the major workflow steps. Many nf-core
  • pipeline_todos - TODO string in README.md: Fill in short bullet-pointed list of the default steps in the pipeline
  • pipeline_todos - TODO string in README.md: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.
  • pipeline_todos - TODO string in README.md: update the following command to include all required parameters for a minimal example
  • pipeline_todos - TODO string in README.md: If applicable, make list of people who have also contributed
  • pipeline_todos - TODO string in README.md: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file.
  • pipeline_todos - TODO string in README.md: Add bibliography of tools and data used in your pipeline
  • pipeline_todos - TODO string in usage.md: Add documentation about anything specific to running your pipeline. For general topics, please point to (and add to) the main nf-core website.
  • pipeline_todos - TODO string in output.md: Write this documentation describing your workflow's output
  • pipeline_todos - TODO string in WorkflowMain.groovy: Add Zenodo DOI for pipeline after first release
  • pipeline_todos - TODO string in WorkflowRiboseq.groovy: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes
  • pipeline_todos - TODO string in base.config: Customise requirements for specific processes.
  • pipeline_todos - TODO string in test_full.config: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
  • pipeline_todos - TODO string in test_full.config: Give any required params for the test so that command line flags are not needed
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
  • pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 2.11.1
  • Run at 2024-01-26 11:28:49

@pinin4fjords pinin4fjords marked this pull request as draft December 22, 2023 17:01
Copy link
Member Author

@pinin4fjords pinin4fjords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some pointers to the important bits - most of the changed files are just pulling in the nf-core components.

params {
test_data_base = 'https://raw.githubusercontent.com/nf-core/test-datasets/modules'
modules_testdata_base_path = 's3://ngi-igenomes/testdata/nf-core/modules/'
hisat2_build_memory = '3.GB'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes hisat behave the same as in the tests on nf-core/modules


// Override modules.config so module snapshots match

withName: FQ_SUBSAMPLE {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines unset things to override workflow level config that was changing things wrt tests in nf-core/modules

@@ -0,0 +1,252 @@
import groovy.json.JsonSlurper
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the most important thing to review- a subworkflow comprising preprocessing from rnaseq.

Once reviewed and approved I'll PR it to modules so we can factor it out of both rnaseq and riboseq.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love that sooo much

@pinin4fjords pinin4fjords changed the title [WIP] Preprocessing from rnaseq Preprocessing from rnaseq Jan 23, 2024
@pinin4fjords pinin4fjords marked this pull request as ready for review January 23, 2024 21:13
Copy link

@adamrtalbot adamrtalbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a massive fan of the monolithic local subworkflow but I guess that's a problem with rnaseq.

bin/gtf2bed Outdated

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bedops/gtf2bed module PR open here but it's stalled out: nf-core/modules#4476

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok to merge as is, and we can then convert it to nf-test, but I'm unsure if the code would behave the same

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this? It seems superfluous to the star_genomegenerate module.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I'm deferring to the authority of rnaseq, and we should de-localise to share components (improving as we do so).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't you the authority of rnaseq?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean 'good enough for rnaseq, good enough for this' - i.e. I want to be consistent but don't have time to fix it everywhere today :-)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm a local shared subworkflow? Seems hard to keep in sync with rnaseq.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also huge! Maybe we should break it up a bit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying not to re-engineer the rnaseq stuff, just re-use it. This should be de-localised and re-engineered as part of that process, but I don't want to derail progress on riboseq by doing that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and hopefully we'll manage most of this with a proper nf-core/references

task.ext.when == null || task.ext.when

script:
def genome_name = params.genome ? params.genome : fasta.getBaseName()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't like the use of params.genome here

nextflow_schema.json Outdated Show resolved Hide resolved
nextflow_schema.json Outdated Show resolved Hide resolved
Copy link
Member

@maxulysse maxulysse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments, looks good to me, I do love the rna preprocess subworkflow that could be pushed to the modules repo.
I think the only blocker for that is for now the location of config files.
I'd say all that is related to that subworkflow should sit with that subworkflow

@pinin4fjords
Copy link
Member Author

Thanks @maxulysse @adamrtalbot

@pinin4fjords pinin4fjords merged commit 4a0099f into dev Jan 29, 2024
53 checks passed
@pinin4fjords pinin4fjords deleted the preprocessing_from_rnaseq branch January 29, 2024 15:38
@pinin4fjords pinin4fjords linked an issue Feb 19, 2024 that may be closed by this pull request
@pinin4fjords pinin4fjords added this to the v1.0.0 milestone Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Define preprocessing
3 participants