Add summaries #105

Phlya · 2021-06-15T11:53:16Z

Adding first support for summary statistics: for now just fraction of cis reads (at all currently calculated minimal separations) and library complexity estimate.

Phlya · 2022-04-08T11:34:29Z

What would make sense to do with complexity estimates after merging?.. I think it depends on whether the merged files are separate "lanes" from the same library (then it should be just recalculated from merged duplicates/total mapped reads), or whether the pairs come from different replicates (then they should be summer up, perhaps?). So do we need an argument for that?

agalitsyna · 2022-04-08T13:04:47Z

tests/test_stats.py

+    assert stats['summary/frac_cis_4kb+'] == 0
+    assert stats['summary/frac_cis_10kb+'] == 0
+    assert stats['summary/frac_cis_20kb+'] == 0
+    assert stats['summary/frac_cis_40kb+'] == 0    


Wow, these are great tests! Is it possible also to test the deduplication stats?
It's not my idea, though, I found it in #5

Yeah that's why I didn't notice the recent error in stats with the new dataframe method, because the test file doesn't have an duplicated pairs... SHould be easy to fix

agalitsyna · 2022-04-08T13:07:04Z

@Phlya How about throwing a warning if "merge" command is in the header?

agalitsyna

Thanks, Ilya! This is cool and important. I only suggest some minor improvements and warning to introduce; also suggest to merge into pre 1.0.0 branch and not into master. I still test 1.0.0, thus I will additionally test these options while preparing the release of new version.

pairtools/pairtools_stats.py

tests/data/mock.4stats.pairs

pairtools/pairtools_stats.py

So tests run

…itly

* Separate cli and lib * pairtools flip fix for unannotated chromosomes, resolving #91 * handle empty chromosomes, resolved #76 * fixed rfrags indexing and first rfrag omission, resolved #73 * resolved or deprecated suggestions in #16 * merge improvements, header merge fixed - resolved merge without arguments: #61 - option to add only the first header in merge, resolved #18 * in merge, added option to concatenate instead of merge sorted inputs, resolving: #23 * merge now checks that columns of inputs are the same * I/O improvements - auto_open defaults to stdin/stdout when path evaluates to False. resolved #48 - auto_open defaults to stdin/stdout when the path is "-" - if the stream is optional, it's controlled by the module itself * Parse2 update (#99) (#109) Improved version of parse2 with resolved comments from the previous PR: #96 - Separation of parse and parse2 modules. Parse has an option --walks-policy all, which parses long walks, but always reporting pair orientation and outer positions of 5'-ends, as if each pair was read in paired-end mode independently. Parse2 is specifically designed for long walks, and has options --report-position and --report-orientation, which might be used to report junctions, or reads, or walks. - Parse2 has an option to parse single-end reads, --single-end option, tested on minimap2 output for MC-3C. - Parse2 has the max_fragment_size instead instead of parse's max_molecule_size, which help to determine the overlapping ends of forward and reverse reads. - Recent update simplifies the code: single _parse library used by both parse and parse2, - a number of functions that reduce repetitive code, e.g. push_pair function, - dosctrings and documented structure of _parse library. - Both parse and parse2 have the options to report 5' or 3' ends; to flip alignments according to chromosome coordinate. - Both parse and parse2 have the pysam backend - Improvements of the tests for parse and parse2 - Documentation includes description of various --report-orientation and --report-position cases. * Merge pairlib into pairtools.lib. * CLI for scalings added. * stats output in yaml format * Header CLI (#121) - new module called by `pairtools header` - submodules: - generate : Generate the header - set-columns : Add the columns to the .pairs/pairsam file - transfer : Transfer the header from one pairs file to another - validate-columns : Validate the columns of the .pairs/pairsam file - resolves #119 - option remove-columns for `pairtools select`: Remove the columns from .pairs/pairsam file * pairtools phase critical update (#114) * imporant fixes: - cython dedup with no-parent id forgotten counter reset; - sphinx doc update (added pysam); - header warning if empty and error if try to add a field to empy one * Add summaries (#105) * Add functions for duplication tile and complexity * Make dedup stats! * Benchmarks finalization * [WIP] Stats split by filters (#132) * Markasdup lib removed; markasdup CLI explanation improved * dedup filter stats added and tested Co-authored-by: Aleksandra Galitsyna <[email protected]> Co-authored-by: Ilya Flyamer <[email protected]>

Add summaries, also some f-strings and black

538cd0c

Phlya requested a review from golobor June 15, 2021 11:53

Phlya added 2 commits June 15, 2021 13:09

Fix stats test

99678bb

Fix merge

6e202cb

agalitsyna mentioned this pull request Apr 6, 2022

pairtools v1.0.0 roadmap #116

Closed

31 tasks

Merge branch 'master' into stats-summaries

87ebf79

Phlya requested a review from agalitsyna April 8, 2022 11:34

agalitsyna reviewed Apr 8, 2022

View reviewed changes

Phlya mentioned this pull request Apr 22, 2022

add non-additive stat summaries to dedup/stats: complexity estimation, cis/total, ... #54

Closed

Phlya added 5 commits April 22, 2022 15:46

Add functions for duplication tile and complexity

96c2e48

Towards duplicate statistics

c05e438

Fix by-chrom stats to ignore dups

a857a1b

Rename mock4stats

3001438

Make dedup stats!

1a614e7

agalitsyna previously requested changes Apr 26, 2022

View reviewed changes

Phlya changed the base branch from master to pre0.4.0 April 26, 2022 19:21

Phlya added 2 commits April 27, 2022 10:35

Address comments

0a02079

FIx argument name

f0bb8ca

Phlya changed the title ~~[WIP] Add summaries~~ Add summaries Apr 27, 2022

Phlya added 3 commits April 27, 2022 14:13

Split dedup and stats into lib and cli for 1.0

9a378fc

Merge

1ad4fc6

Merge branch 'pre0.4.0' into stats-summaries

6ff42b3

Phlya removed the request for review from golobor April 27, 2022 13:13

Phlya added 4 commits April 27, 2022 15:17

Test always

034b8d0

fix testing

bc647ba

Avoid double testing in PRs

c356d74

Fix stats

1654e8c

Phlya and others added 5 commits April 27, 2022 15:33

Add missing pieces in stats

7cd0c64

Fix missing imports

09cfab0

FIx order in stats test

6154e28

important fixes; tests are back.

7face1d

compute_summaries is now part of saving the stats, if not done explic…

c762fd5

…itly

agalitsyna merged commit 2bdac9a into pre0.4.0 Apr 27, 2022

agalitsyna mentioned this pull request Apr 27, 2022

optical dedup stats #59

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add summaries #105

Add summaries #105

Phlya commented Jun 15, 2021

Phlya commented Apr 8, 2022

agalitsyna Apr 8, 2022

Phlya Apr 8, 2022

agalitsyna commented Apr 8, 2022

agalitsyna left a comment

Add summaries #105

Add summaries #105

Conversation

Phlya commented Jun 15, 2021

Phlya commented Apr 8, 2022

agalitsyna Apr 8, 2022

Choose a reason for hiding this comment

Phlya Apr 8, 2022

Choose a reason for hiding this comment

agalitsyna commented Apr 8, 2022

agalitsyna left a comment

Choose a reason for hiding this comment