-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter stats option. #172
base: master
Are you sure you want to change the base?
Filter stats option. #172
Conversation
Ran this on one of my real projects. Here are the lines of two files for the same library until the by-distance cis counts, old file and mapq_30 filtered new.
Some strange differences: different total number and different total_unmapped. 0 total_single_sided_mapped in the new file! Very strange. And I guess related, a lot of pair types are missing... I guess is makes sense with the mapq_30 filter if you think about it, but it's a little confusing that even the total number is affected. |
@Phlya Can you provide no_filter stats and full stats file, please? |
I suggest checking G1_DMSO_rep2.no_filter.hg38.stats.txt against G1_DMSO_rep2.hg38.stats.txt, not G1_DMSO_rep2.mapq_30.hg38.stats.txt |
Ah I didn't have the no_filter in this pipeline, actually! Running it |
by the way, the total number of reads is inevitably reduced with mapq30 filter, because |
Yes. Technically it makes sense, but actually that's not what you'd expect... Or at least not what I would expect. |
Can you think of the script or pairtools command that will do what you expect to have in filtered stats then? |
I guess just for the stats purposes we can modify the filter to include all unmapped or ss mapped pairs too? I think then everything above the total_mapped will be the same as without the filter, and that is the most confusing part for me. Marking would be the best though. Or stats could be modified to apply a filter on the fly and accumulate, in this case, stats at different mapq filtering levels? But all this requires modifying pairtools. |
So the cluster was super busy, but finally here is an example of no_filter vs mapq_30 |
Hi, @Phlya Thanks |
No filter and no_filter are identical |
Thanks, cool! So, with filter stats one should interpret only the pair types that are mapped and actually ended up as meaningful contact pairs in coolers. Any other thoughts? |
For individual libraries and library groups the file names are inconsistent: one has assembly name before filter, the other filter before assembly. |
I didn't realize WW counted as unmapped, btw! That's a little misleading actually, if one wanted to compare different pipelines, for example... Perhaps any pair types that are not rescued should be counted in another category, "unresolved" or something like that. That would also make the output of this new stats a bit more sensible. Not suggesting it as part of this PR of course, just a thought... |
This can be removed now, @Phlya ? |
When binning coolers, the user might provide a set of filters to include certain types of pairs into final matrices. However, for stats, there was no such option. In this PR, we propose two additional distiller processes that will allow collecting and merging the stats for the same set of filters.
Disclaimer: this is an option that does not affect the rest of distiller. The addition of two filter+stats processes is controlled by params.stats.use_filters==True, which is set to False by default.
The addon is backward-compatible with previous versions of params files.