-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multithreaded support from Cutadapt in trim_galore #16
Comments
Sorry just saw the closed issues with questions of multithreaded support - sorry for opening the new issue! Another thought I had was it might be possible to use GNU parallel to run multiple instances of trim_galore if you have many samples. |
Hi Andrew, |
I'd be very interested in being able to use the new multi-threading features of cutadapt via TrimGalore. Please consider this a +1 for demand! |
Noted, thanks. |
+1 for this feature, google brought me here with the same request in mind! |
If some nice person were to create a PR implementing this would you consider merging it? |
I would definitely consider that. I suppose we, as in 'the nice person', would have to make sure that it still runs fine on machines that don't have Python 3 installed etc though. While we are at it, did you manage to finish the MultiQC module for SNPsplit reports? Maybe I could find some time to get that done at some point? Cheers, Felix |
[Devon looks away embarrassed] Right, that, it's still on my to do list |
Don't be, it was super cool of you to make a start on it anyways. I was considering to try out how to add something to MultiQC for some time, so I could try to take a look at this if you didn't get round to it. But regarding Trim Galore, I assume getting the multicore trimming to work would require a check for the version of python that is currently running, as well as whether the parallel zipping is installed. It might get more icky for downstream tests that are performed in Trim Galore itself, such as the length validation, N-trimming etc, which all require the reads to be present in a synchronised fashion between R1 and R2... |
Yeah, it'll require some testing before I would want to make a PR, though I can tell you that cutadapt returns R1 and R2 synchronized, since it trims both at once. Feel free to play around with my current SNPsplit PR and improve it. We're not using it heavily at the moment so there's not a lot of motivation. |
That is good because it will make a lot of things easier. It will mean though that we need to move the paired-end validation testing from two consecutive trimming processes into this new trimming mode. This might be a bit of work but was probably overdue anyway.... But yea it would be great if you wanted to start on that, I'm of course happy to assist if possible at all.
Same here, but I am always looking for things to broaden my horizon... |
Upon further inspection it looks like implementing this would require an almost complete rewrite of how Trim Galore is internally structured, which is rather more involved than I have time for at the moment. For us the most useful feature of Trim Galore is the automatic detection of the most appropriate adapter sequence, I'll instead check and see if I can implement that in |
That's pretty much the same conclusion I came to. Given that we never really are in need of having to speed up any single trimming event (as we can run hundreds of them in parallel), adding multithread support didn't seem to be justified, or at least high up on a priority list. Thanks for looking anyway. Cheers, Felix |
Another vote for making Trim Galore/Cutadapt multi-threaded, if possible..... |
Thanks for the vote Wolfgang. In my preliminary tests (see issue #39) I found that multi-core support doesn't exactly look worthwhile (certainly not specifying beyond 2 cores...): Maybe indeed splitting the input into junks would be another option to speed things up? |
@FelixKrueger, were these tests ran with pigz installed? In my tests, running Trim-galore with more than two cores (using code from #39) provided a speed up compared to two cores alone. Maybe you are be limited by how fast gzip can process the fastq files? |
@FelixKrueger, I provided some benchmarks from my system, as well as updated the code to do some checks. Please check out the updates in #39. |
No, these tests were run without |
* test(#6505): Add pigz to versions.yml * feat: Add pigz to trimgalore nf-core/atacseq#394 FelixKrueger/TrimGalore#16 (comment) https://github.com/FelixKrueger/TrimGalore/blob/master/CHANGELOG.md#version-060-release-on-1-mar-2019 * style: Run lsp format * Update modules/nf-core/trimgalore/main.nf --------- Co-authored-by: Simon Pearce <[email protected]>
Dear Felix,
I really like this piece of software and have been testing it for my research, but I was wondering how difficult it would be to implement multithreaded support that is available in basic cutadapt? Is it possible to pass the option or arguement from trim_galore to cutadapt?
My current loop I am using is as follows:
for file in *all.fastq ; do
trim_galore -fastqc_args "-t 60" -o /NGS_Data/Andrew/test/test2/ ${file}
done
Thanks again,
Andrew
The text was updated successfully, but these errors were encountered: