-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preprocessing from rnaseq #8
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some pointers to the important bits - most of the changed files are just pulling in the nf-core components.
params { | ||
test_data_base = 'https://raw.githubusercontent.com/nf-core/test-datasets/modules' | ||
modules_testdata_base_path = 's3://ngi-igenomes/testdata/nf-core/modules/' | ||
hisat2_build_memory = '3.GB' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes hisat behave the same as in the tests on nf-core/modules
|
||
// Override modules.config so module snapshots match | ||
|
||
withName: FQ_SUBSAMPLE { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines unset things to override workflow level config that was changing things wrt tests in nf-core/modules
@@ -0,0 +1,252 @@ | |||
import groovy.json.JsonSlurper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the most important thing to review- a subworkflow comprising preprocessing from rnaseq.
Once reviewed and approved I'll PR it to modules so we can factor it out of both rnaseq and riboseq.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love that sooo much
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a massive fan of the monolithic local subworkflow but I guess that's a problem with rnaseq.
bin/gtf2bed
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a bedops/gtf2bed module PR open here but it's stalled out: nf-core/modules#4476
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok to merge as is, and we can then convert it to nf-test, but I'm unsure if the code would behave the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this? It seems superfluous to the star_genomegenerate module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I'm deferring to the authority of rnaseq, and we should de-localise to share components (improving as we do so).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't you the authority of rnaseq?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean 'good enough for rnaseq, good enough for this' - i.e. I want to be consistent but don't have time to fix it everywhere today :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm a local shared subworkflow? Seems hard to keep in sync with rnaseq.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also huge! Maybe we should break it up a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying not to re-engineer the rnaseq stuff, just re-use it. This should be de-localised and re-engineered as part of that process, but I don't want to derail progress on riboseq by doing that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and hopefully we'll manage most of this with a proper nf-core/references
Co-authored-by: Maxime U Garcia <[email protected]>
task.ext.when == null || task.ext.when | ||
|
||
script: | ||
def genome_name = params.genome ? params.genome : fasta.getBaseName() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't like the use of params.genome
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments, looks good to me, I do love the rna preprocess subworkflow that could be pushed to the modules repo.
I think the only blocker for that is for now the location of config files.
I'd say all that is related to that subworkflow should sit with that subworkflow
Co-authored-by: Maxime U Garcia <[email protected]>
Thanks @maxulysse @adamrtalbot |
I'm proposing bringing in the preprocessing from the main rna-seq workflow. It had a couple of issues for this workflow, which I have resolved:
I have wrapped up that preprocessing in a subworkflow, which, after discussion, I will centralise to the nf-core modules repo.
Note that there is a lot of config carried over from rnaseq we may not use, and may well be removed before the first release. But I'd like to keep it in place until we resolve the next steps surrounding alignment and quantification, and see how much of it we'll reuse.
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).