-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sampling from bam (plotEnrichment) #530
Comments
Before going down this route, let's do some performance profiling first. |
The slowness in this case comes from needing a few minutes to read in the BED file in each chunk sent for processing. One option would be to make that variable and to set it to something like 20 megabases in the case you're experiencing. Note that that won't work if I'll profile the C code to see if there's anything that can be improved there, but I'm not holding my breath, since it makes no assumption of sort order (and that's not something I'll change). |
Since the bed file is not modified, it can be shared among process. Maybe
there is a chance here for some optimization.
…On Tue, May 16, 2017 at 2:54 PM, Devon Ryan ***@***.***> wrote:
The slowness in this case comes from needing a few minutes to read in the
BED file in each chunk sent for processing. One option would be to make
that variable and to set it to something like 20 megabases in the case
you're experiencing. Note that that won't work if --region is specified,
since then whatever is passed in will get ignored and overwritten.
I'll profile the C code to see if there's anything that can be improved
there, but I'm not holding my breath, since it makes no assumption of sort
order (and that's not something I'll change).
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#530 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEu_1ZQt40mYIgssbvqYyRdQaZrBEvWmks5r6ZxsgaJpZM4NcMmJ>
.
--
Fidel Ramirez
|
Is there a way to share things that can't be pickled (I look forward to @thomasmanke asking what the heck this means :) )? |
Relatedly and courtesy of Fidel, see here: https://github.com/maxplanck-ie/HiCExplorer/blob/develop/hicexplorer/hicBuildMatrix.py#L13-L15 |
Ugh, thank goodness this is only relevant for I really hate python's global interpreter lock, it just creates headaches. |
Hmm, the path of least resistance for this is to just read in the files before forking and make the |
That turns out to work reasonably well as a solution. I still hate how much memory python is wasting, but that's at least largely unchanged. This is now implemented in the |
use case: need to estimate of fraction of reads overlapping with many regions (restriction sites)
Current solution
plotEnrichment --BED RS.bed -b Input.bam --Offset 1 --outRawCounts RS.freq -p 10
is very slow for BED-file with 43M entries. This might be improved by sampling from Input.bam.
Currently --region could be used, but I would prefer to sample independently of chromosomes, i.e a certain fraction or a given number of reads. Such a sampling parameter could become an extra filter for all tools.
The text was updated successfully, but these errors were encountered: