pairtools parse, dedup and sort use only a few cores per process #214
Replies: 3 comments 3 replies
-
In your command there is no dedup... But anyway, basically no... The cores you set to 96 here are just for the reading/writing de/compression processes. They don't need so many to be very fast. With very large datasets the typical approach if you have access to a large server or a cluster is to split the fastq files into chunk from the beginning and do all steps up until dedup with chunks, this can speed it up a lot. For dedup you have to merge, otherwise duplicates will be missed, and then it just takes time. You can check out our pipeline that implements the chunking and all other steps: https://github.com/open2c/distiller-nf |
Beta Was this translation helpful? Give feedback.
-
in case of |
Beta Was this translation helpful? Give feedback.
-
All the four files are processed using the same pipeline, and one sample is downsampled. (base) root@ray-m5-15-bc17-head-f5cb7de2-compute:/home/WT# ls -lh |
Beta Was this translation helpful? Give feedback.
-
Hi, dear all,
I am using pairtools to process some hic experiemental files. I want to set the cores to be used to 96 for parse and dedup command, but actually only 1-2 cores are really been used! Is there a way to really accelarate the processing time? Thanks!
Beta Was this translation helpful? Give feedback.
All reactions