-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Step 4 taking too long #23
Comments
Hello, Hope it helps. Thanks, |
Thanks, I already ran the demo data and everything worked fine. For splitting by chromosome, can I just specify a single chromosome in the chromosome list parameter and then run the step separately for each chromosome (instead of splitting the input file)? I ran the previous steps at a resolution of 10k. Would it be ok if I now run step 4 at a higher resolution or do I have to start from the beginning? Also are you recommending 1mb or 10mb just for testing or for actual analysis? Thanks Thanks |
Yes, that should also work!
Yeah.....you have to go back to step3 as the bin index is given in that step. Actually, most duplicate removal procedures have nothing to do with binning. Basically, it sorts the alignment position (chrom-start-end) and removes those reads with the same genomic coordinate. In other words, you probably won't observe too much acceleration until the normalization step. You may monitor your memory to see if that is the bottleneck limiting the sorting procedure. Or you may further consider splitting one chromosome into small chunk of files that are ordered by starting position or similar strategies...... Choosing a resolution depends on the nature of your study. If you want to focus on a more general structure like compartment or TAD, 1Mb or 100kb may be sufficient. If your study requires a much-refined structure or your target is at a small range of genome, then a much higher resolution is needed. Best, |
Thanks, I am now doing it in parallel after splitting by chromosome. It seems the bottleneck is in the duplicate removal step as increasing the resolution doesn't speed up the process. Just wanted to ask how I should merge the output files from the chromosomes for the next steps. Should I just use the 'cat' command to concatenate them like this: cat *.MULTI.binPair.multi > allchroms.MULTI.binPair.multi |
Yes, you are right! |
Hi, I am getting this error when running step6 on some of my chromosomes: Traceback (most recent call last): This is only happening on a few chromosomes and most run fine. I ran step5 with normalization set to 'None' (because I was having some other errors using KR normalization). Any idea why this might happen? Thanks |
It seems that the spline is not fitted correctly. Maybe......it is because the data is too sparse for some chromosomes. Maybe try with larger bins? Thanks, |
Hello,
I am running some human hic data through the pipeline and step 4 is taking extremely long. At the end of step3 I have around 150M uniquely mapping pairs and 35M multimapping pairs. Up to now my step4 has been running for around 24 hours and the only outputs are validPairs.MULTI and validPairs.UNI files for chromosome 1 and 2. At this rate this step would take >10 days to run. Is this normal and are there any parameters I can tune to make this faster? I am currently using all default parameters from the demo file (binning resolution of 10k with KR normalization).
I can use up to 24 cpus and 100G of memory but this step doesn't seem to have any options to use multiple cpus. Would appreciate any feedback.
Thanks
The text was updated successfully, but these errors were encountered: