-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
updated TrimGalore for multi-core support #65
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect! Thanks @drewjbeh . Just some very pedantic changes so I can sleep at night 😄
CHANGELOG.md
Outdated
@@ -13,6 +13,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0. | |||
|
|||
* [#118](https://github.com/nf-core/chipseq/issues/118) - Running on with SGE | |||
* Make executables in `bin/` compatible with Python 3 | |||
* [#63](https://github.com/nf-core/atacseq/issues/63) - Added multicore support for Trim Galore! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe put this in the Added
section instead?
Also, if you feel inspired at any point would be great if you can make a similar PR to |
Co-Authored-By: Harshil Patel <[email protected]>
Co-Authored-By: Harshil Patel <[email protected]>
Co-Authored-By: Harshil Patel <[email protected]>
All done! Thanks for your help - and it was a pleasure. Will look at |
Great. Ill wait for the Travis CI tests to pass and then merge 👍 |
Hi chaps, I think that this change may break things on strict systems...
|
Hmmm...yes, I breezed over that hoping for the best! Maybe we need to do some funky maths based on the value of |
God forbid we specify an even number of cores 😅 |
Haha, yes.. It’s the same for the Bismark suite in the methylseq pipeline too. Once you know how many cpus are used per core you should just be able to copy and paste from there. |
Bit confused by the difference in usage between SE and PE. Aside from the 2 extra e.g. Had a look at the @FelixKrueger any advice before we construct the logic would be peachy 😄 |
Hi Harshil, I would agree that SE and PE should have the same requirements in terms of CPU. And memory usage should indeed not be any issue. I could check next time I run it but I don't think it will ever exceed more than 1 or 2GB. Have never tested this thoroughly in multicore mode though. Does this help? |
Thanks for the rapid response @FelixKrueger 😎 I have updated my representative example for 3 cores for clarity. Is it still correct? If so, then I think that should be enough to work out |
@FelixKrueger but your documentation says that PE needs more cores?
|
Yes, it seems that either the documentation or the logic isnt right. Assumed it was the documentation based on @FelixKrueger comment. |
Sorry I was stuck in a meeting. As a very detailed breakdown I would say the core usage is the following: [process: cores / load]: Trimming process:
So in default mode ( For paired-end only, there is a validation process:
So if you are extremely strict then this would amount to 5 cores, although in terms of load this is hardly fair as it is really reading/writing 2 files as a single pass. (If the user chooses to also output unpaired reads there might be another 2 write streams on low load in addition to that). Maybe a very strict the formula you would want to be using would then be for I have just run a test run on 2 million paired-end reads, the memory consumption was 551M. |
Thanks @FelixKrueger Based on that breakdown maybe the formula I used isnt correct then:
We will have to come up with an alternative such as the below? def cores = 1
if (task.cpus) {
def tcores = ((task.cpus as int) - 3) as int
if (!params.single_end) {
tcores = ((task.cpus as int) - 4) as int
}
if (tcores > 1) {
cores = tcores
}
} In any case, Im wondering whether we should cap it so that the max value of |
I think limiting the number of cores to 4 is a good idea, as in my tests I found that:
(from |
I just had a go at doing this in nf-core/methylseq#137 - I used a slightly refactored version of your code @drpatelh: cores = 1
if(task.cpus){
cores = (task.cpus as int) - 4
if (params.single_end) cores = (task.cpus as int) - 3
if (cores < 1) cores = 1
if (cores > 4) cores = 4
} |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@FelixKrueger I just want to double-check the implementation below is ok before I roll it out to the Thank you! |
Hi Harshil, That looks exactly like what you want to be doing. Good luck! |
Great! Thanks @FelixKrueger 👍 |
Many thanks to contributing to nf-core/atacseq!
Please fill in the appropriate checklist below (delete whatever is not relevant). These are the most common things requested on pull requests (PRs).
PR checklist
nextflow run . -profile test,docker
).docs
is updatedCHANGELOG.md
is updatedLearn more about contributing: https://github.com/nf-core/atacseq/tree/master/.github/CONTRIBUTING.md