Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downsampling of reads #213

Open
cwuensch opened this issue Feb 7, 2017 · 9 comments
Open

Downsampling of reads #213

cwuensch opened this issue Feb 7, 2017 · 9 comments

Comments

@cwuensch
Copy link

cwuensch commented Feb 7, 2017

We are using Biodalliance genome browser with high-coverage bam-files (up to 10,000 reads per base pair).
By default, the limit of reads to be displayed is set to 100. (And there has to be a limit, because it gets terribly slow, if not)
Problem is, the genome browser seems to just take the first 100 reads then. In a recent case, there was not one singe read displayed for the locus in question, but only reads which started right from the current position. In other cases you may have only wildtype-reads being displayed while the mutated ones get clipped.
Could you implement some form of statistical downsampling? e.g. selecting the reads to be displayed per random? Or just taking every 100th read or something like that?

@dasmoth
Copy link
Owner

dasmoth commented Feb 7, 2017

There isn't a built in option for this, but if you configure your tracks programatically, you can do this via a plugin.

Something like:

function readDownsampler(featureSets) {
    const reads = featureSets[0];
    const sampledReads = [];
    for (var i = 0; i < reads.length; i += 10) {
         sampledReads.push(reads[i]);
    }
    return sampledReads;
}

...then configure your source with...

{
    name: 'Downsample test',
    bamURI: '/path/to/data.bam',
    merge: readDownsampler
}```

A slight downside is that if you return to almost but not quite the same region of the genome, you'll end up seeing a different subset of the reads.  If this matters, it might be better to sample based on,  e.g., MD5 of the read ID instead.

Having said all that, I think it's a great idea to have something along these lines this built into the core -- so will leave this issue open for now.

@cwuensch
Copy link
Author

Thank you for this great solution!!
Unfortunately something goes wrong here...
When I copy this code exactly as described here, the function readDownsampler never gets called (I inserted some debug log output - which never gets printed).
When I write it with brackets, i.e. merge: readDownsampler(), then the function gets called, but featureSets is undefined.
What to do about this issue?

@cwuensch
Copy link
Author

And another question: Is it possible to access the user-defined variable limit from within this function?
With this the downsampling could be adapted to the limit of reads to be shown, as defined by the user in the config dialog.

@dasmoth
Copy link
Owner

dasmoth commented Feb 15, 2017

Sorry for the confusion -- the example I sent is something that really ought to work, but currently doesn't because of the way two features (combining multiple data sources in one track, and applying arbitrary filters to data) are coupled together.

The following version is actually tested :-)

{
    name: 'Downsample test',
    overlay: [{bamURI: '/path/to/data.bam'}],
    merge: readDownsampler
}

(The readDownsampler function itself is fine). I'm going to tweak things so that the example as I originally wrote it does actually work -- but might not happen right away.

@cwuensch
Copy link
Author

Great! This solution works indeed for filtering the read data.

But... sorry that I have to ask questions again...

We have use this in combination this with
(a) a bam index file (bai)
(b) a style sheet configuration that enables "Highlight mismatches and strands" by default
(c) a readDownsampler() function that considers the user configured read limit

I could not figure out by now, where to correctly place the style sheet information in order to work correctly with the overlay command. When we put it below the merge command, the checkbox "Highlight mismatches and strands" gets checked, but the corresponding style seems not to be applied. Even with manually unchecking and re-checking the checkbox, the style gets not applied.
Do you have an idea, how to solve this?

Additionally, is there a possibility to read out the user configured read limit in order to use it in the downsampling-function?

@dasmoth
Copy link
Owner

dasmoth commented Feb 18, 2017

Re: mismatch colouring...

Thanks for spotting this. It sounds like your config is fine, but some logic that's used to determine whether reference sequence data needs to be threaded through to a given track's renderer was failing when your custom filter was applied. This has been fixed in the git-latest version.

Re: user-configurability of the the custom filter.

Do you want to be able to configure this at run time (via a custom field in the track editor). Currently no way of doing this, but I'd certainly agree it would be nice!

@cwuensch
Copy link
Author

Thanks again!

Re: mismatch colouring...
After having built the latest version from git, the mismatch colouring actually works fine.
BUT, there has appeared some new issue with the latest version...
The "cursor" indicating the middle position gets not correctly positioned after applying the API function SetLocation().
Furthermore, the cursor "jumps" back and forward, when the user opens the configuration dialog. That is kind of ... weird.

Re: custom filter
Actually I do not really need to let the user configure the read downsampling limit in the track editor.
BUT, there already IS a field "limit" in the track editor, which is pre-configured with 100, and which will be out-of-function, if we cannot read out its value from the downsampling function.
Furthermore, the limit of 100 is not very suitable for us. If it is a hard limit that cannot be changed by the user anymore, then it would be nice, if we could at least pre-configure it with a higher value, like 500.
Does there exist a style-option for changing this value?

@dasmoth
Copy link
Owner

dasmoth commented Feb 22, 2017

I'm concerned about what you say regarding the "cursor" (do you mean the vertical position indicator in the middle of the browser area). Could you send a screenshot or two to illustrate this (offline to [email protected] is fine if you prefer).

Regarding the "bumping limit", it can be configured as a top level (not stylesheet) option on a track configuration:

        {
               name: 'my track',
               bamURI: '...',
               subtierMax: 500
         }

@cwuensch
Copy link
Author

cwuensch commented Feb 28, 2017

Thanks for the solution to increase the bumping limit!

Regarding the "cursor":
Right, I am talking about the position indicator in the middle of the browsing area.

1.) When I change the position to be displayed via the API SetLocation(), the cursor gets displayed at a wrong location (and I think with a wrong width):
1 - wrongcursor

2.) Opening the configuration panel causes the cursor to jump to the left (seems the panel's width gets subtracted from the browser's width, and the cursor is rendered in the middle of the reduced with.
2 - panelopen

3.) Closing the config panel causes the cursor to be finally displayed at the correct position.
3 - panelclosed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants