-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Downsampling of reads #213
Comments
There isn't a built in option for this, but if you configure your tracks programatically, you can do this via a plugin. Something like: function readDownsampler(featureSets) {
const reads = featureSets[0];
const sampledReads = [];
for (var i = 0; i < reads.length; i += 10) {
sampledReads.push(reads[i]);
}
return sampledReads;
} ...then configure your source with... {
name: 'Downsample test',
bamURI: '/path/to/data.bam',
merge: readDownsampler
}```
A slight downside is that if you return to almost but not quite the same region of the genome, you'll end up seeing a different subset of the reads. If this matters, it might be better to sample based on, e.g., MD5 of the read ID instead.
Having said all that, I think it's a great idea to have something along these lines this built into the core -- so will leave this issue open for now. |
Thank you for this great solution!! |
And another question: Is it possible to access the user-defined variable |
Sorry for the confusion -- the example I sent is something that really ought to work, but currently doesn't because of the way two features (combining multiple data sources in one track, and applying arbitrary filters to data) are coupled together. The following version is actually tested :-)
(The readDownsampler function itself is fine). I'm going to tweak things so that the example as I originally wrote it does actually work -- but might not happen right away. |
Great! This solution works indeed for filtering the read data. But... sorry that I have to ask questions again... We have use this in combination this with I could not figure out by now, where to correctly place the style sheet information in order to work correctly with the overlay command. When we put it below the merge command, the checkbox "Highlight mismatches and strands" gets checked, but the corresponding style seems not to be applied. Even with manually unchecking and re-checking the checkbox, the style gets not applied. Additionally, is there a possibility to read out the user configured read limit in order to use it in the downsampling-function? |
Re: mismatch colouring...Thanks for spotting this. It sounds like your config is fine, but some logic that's used to determine whether reference sequence data needs to be threaded through to a given track's renderer was failing when your custom filter was applied. This has been fixed in the git-latest version. Re: user-configurability of the the custom filter.Do you want to be able to configure this at run time (via a custom field in the track editor). Currently no way of doing this, but I'd certainly agree it would be nice! |
Thanks again! Re: mismatch colouring... Re: custom filter |
I'm concerned about what you say regarding the "cursor" (do you mean the vertical position indicator in the middle of the browser area). Could you send a screenshot or two to illustrate this (offline to [email protected] is fine if you prefer). Regarding the "bumping limit", it can be configured as a top level (not stylesheet) option on a track configuration:
|
We are using Biodalliance genome browser with high-coverage bam-files (up to 10,000 reads per base pair).
By default, the limit of reads to be displayed is set to 100. (And there has to be a limit, because it gets terribly slow, if not)
Problem is, the genome browser seems to just take the first 100 reads then. In a recent case, there was not one singe read displayed for the locus in question, but only reads which started right from the current position. In other cases you may have only wildtype-reads being displayed while the mutated ones get clipped.
Could you implement some form of statistical downsampling? e.g. selecting the reads to be displayed per random? Or just taking every 100th read or something like that?
The text was updated successfully, but these errors were encountered: