Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Healx starting differential abundance workflow #11

Merged
merged 48 commits into from
Nov 14, 2022

Conversation

pinin4fjords
Copy link
Member

@pinin4fjords pinin4fjords commented Oct 28, 2022

For demonstration and basis for discussion, this WIP PR demonstrates my take on a differential abundance workflow using various recently developed modules.

One or two modules are still awaiting approval, others are awaiting PR approvals for fixes, but all should work fine as committed here. I run locally like:

nextflow run -resume -profile mamba main.nf \
    --input $(pwd)/testdata/SRP254919.samplesheet.csv \
    --gtf $(pwd)/testdata/genes.gtf.gz \
    --contrasts $(pwd)/testdata/SRP254919.contrasts.csv \
    --matrix $(pwd)/testdata/SRP254919.salmon.merged.gene_counts.top1000cov.tsv \
    --outdir $(pwd)/testdata/output

All test files are available in the test data repo, apart from the GTF, which I retrieved from iGenomes for mouse.

The steps are:

  • Make a feature annotation table from a GTF
  • Make a feature/ observation / matrix composite from the features, samples and input matrix (I see this as being transferable to features other than genes in future).
  • Run a validation to check the internal consistency of features, samples, matrix and contrasts
  • Run differential expression analysis per contrast with DESeq2
  • Run an exploratory analysis on matrix outputs, with separate coloring for each unique variable used to define contrasts (see notes in workflow comment)
  • Generate volcano plots per contrast.

The output file structure is like:

testdata/output
├── pipeline_info
│   ├── execution_report_2022-10-28_23-00-55.html
│   ├── execution_timeline_2022-10-28_23-00-55.html
│   ├── execution_trace_2022-10-28_23-00-55.txt
│   ├── pipeline_dag_2022-10-28_23-00-55.html
│   └── software_versions.yml
├── plots
│   ├── differential
│   │   ├── treatment_mCherry_hND6_
│   │   │   ├── html
│   │   │   └── png
│   │   ├── treatment_mCherry_hND6_sample_number
│   │   │   ├── html
│   │   │   └── png
│   │   └── versions.yml
│   └── exploratory
│       ├── treatment
│       │   ├── html
│       │   └── png
│       └── versions.yml
└── tables
    └── differential
        ├── treatment-mCherry-hND6-sample_number.R_sessionInfo.log
        ├── treatment-mCherry-hND6-sample_number.dds.rld.rds
        ├── treatment-mCherry-hND6-sample_number.deseq2.dispersion.png
        ├── treatment-mCherry-hND6-sample_number.deseq2.results.tsv
        ├── treatment-mCherry-hND6-sample_number.deseq2.sizefactors.tsv
        ├── treatment-mCherry-hND6-sample_number.normalised_counts.tsv
        ├── treatment-mCherry-hND6-sample_number.vst.tsv
        ├── treatment-mCherry-hND6.R_sessionInfo.log
        ├── treatment-mCherry-hND6.dds.rld.rds
        ├── treatment-mCherry-hND6.deseq2.dispersion.png
        ├── treatment-mCherry-hND6.deseq2.results.tsv
        ├── treatment-mCherry-hND6.deseq2.sizefactors.tsv
        ├── treatment-mCherry-hND6.normalised_counts.tsv
        ├── treatment-mCherry-hND6.vst.tsv
        └── versions.yml

15 directories, 22 files

To do

  • Gather feedback
  • Get the actual CI up and running using the available test data
  • Start to account for non-RNA-seq data
  • Better integrated reporting (MultiQC integration?)

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs- [ ] If necessary, also make a PR on the nf-core/differentialabundance branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@pinin4fjords pinin4fjords marked this pull request as draft October 28, 2022 22:18
@@ -91,6 +93,7 @@ profiles {
params.enable_conda = true
conda.useMamba = true
docker.enabled = false
conda.enabled = true
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(note that this is because nf-core/tools#1952 won't have been in the release version of the tools Oskar used)

@github-actions
Copy link

github-actions bot commented Oct 31, 2022

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 916de39

+| ✅ 154 tests passed       |+
#| ❔   3 tests were ignored |#
!| ❗  11 tests had warnings |!

❗ Test warnings:

  • pipeline_todos - TODO string in README.md: Add full-sized test dataset and amend the paragraph below if applicable
  • pipeline_todos - TODO string in README.md: If applicable, make list of people who have also contributed
  • pipeline_todos - TODO string in README.md: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file.
  • pipeline_todos - TODO string in README.md: Add bibliography of tools and data used in your pipeline
  • pipeline_todos - TODO string in WorkflowMain.groovy: Add Zenodo DOI for pipeline after first release
  • pipeline_todos - TODO string in test_full.config: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
  • pipeline_todos - TODO string in test_full.config: Give any required params for the test so that command line flags are not needed
  • pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your prefered methods description, e.g. add publication citation for this pipeline
  • pipeline_todos - TODO string in output.md: Write this documentation describing your workflow's output
  • pipeline_todos - TODO string in usage.md: Add documentation about anything specific to running your pipeline. For general topics, please point to (and add to) the main nf-core website.

❔ Tests ignored:

  • files_unchanged - File ignored due to lint config: assets/email_template.html
  • files_unchanged - File ignored due to lint config: assets/email_template.txt
  • files_unchanged - File ignored due to lint config: lib/NfcoreTemplate.groovy

✅ Tests passed:

Run details

  • nf-core/tools version 2.6
  • Run at 2022-11-11 17:06:49

@pinin4fjords pinin4fjords marked this pull request as ready for review November 1, 2022 10:25
@pinin4fjords
Copy link
Member Author

The README, docs etc are not quite complete, but this is working in minimal form, including the basic tests. Realise you have your own approach @WackerO @ggabernet so you may well rather do things a different way, but hopefully some of what's here is useful.

The modules are all as in the nf-core modules repo, except for nf-core/modules#2399, which is still in review.

I did encounter nextflow-io/nextflow#3328 when using this workflow in Tower, just FYI in case you're also using Tower. Hopefully it will be fixed at some point - Paolo is aware.

Copy link
Collaborator

@WackerO WackerO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than MultiQC LGTM

workflows/differentialabundance.nf Outdated Show resolved Hide resolved
@pinin4fjords
Copy link
Member Author

Thanks for review @WackerO

@pinin4fjords pinin4fjords merged commit 6f646e6 into nf-core:dev Nov 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add differential plotting Add exploratory plotting Add differential analysis Add input checking
2 participants