pairtools dedup
: update default chunksize to 10,000 to prevent memory overflow on datasets with high duplication rate
pairtools select
regex update (string substitutions failed when the column name was a substring of another) -
Warnings capture in dedup: pairs lines are always split after rstrip newline
Important fixes of splitting schema
Dedup comment removed (failed when the read qualities contained "#")
Remove dbist build out of wheel
pairtools scaling: fixed an issue with scaling maximum range value #150 (comment)
Fixed issue with pysam dependencies on pip and conda
pytest test engine instead of nose
Small fixes in teh docs and scaling
This is a major release of pairtools since last release (April 2019!)
- sphinx docs update with incorporated walkthroughs
- parse2 module with CLI for parsing complex walks
- scaling and header modules with CLI
pairtools dedup
- finalize detection of optical duplicates #106 and #59, also related to #54
- chunked dedup by @Phlya
- improvement of dedup to include reporting of the parent readID by @Phlya and @agalitsyna
pairtools stats/scaling
- split dedup stats and regular stats
- output chromosome size to the stats output #83
- pairtools stats: YAML output? #111 and #79
- pairtools scaling tool which takes into account chromosome sizes: #81, #56?
pairtools parse
- parse complex walks engine and tools: #109
- stdin and stdout reporting defaults: #48
- flipping issue: #91
pairtools phase
- make work with both pip and github versions of bwa: #114
pairtools restrict
- Handle empty pairs with "!" chromosomes: #76
- Problem with restriction sites header/first rfrag: #73
- Suggestions by @golobor: #16
pairtools merge
Headers maintenance
- allow adding a header to a headerless file #119 or broader addition of the headed module, draft: #121
Code maintenance
- transfer pairlib into sandbox of pairtools lib
- separate cli and lib
- Remove OrderedDict: #113
- Clean up deprecation warnings, e.g. #71
- Fix input errors without explanations, e.g. #61
Docs improvements
- pairtools walkthrough
- phasing walkthrough
- parse docs update
Tests proposals
- add summaries: #105
- support of bwa mem2, which is 2-3 times faster than usual bwa mem: #118
- I/O single utility instead of repetitive code in each module
- sample: a new tool to select a random subset of pairs
- parse: add --readid-transform to edit readID
- parse: add experimental --walk-policy all (note: it will be moved to a separate tool in future!)
- all tools: use bgzip if pbgzip not available
Internal changes:
- parse: move most code to a separate _parse module
- _headerops: add extract_chromosomes(header)
- all tools: drop py3.5 support
- switch from travis CI to github actions
- parse: tag pairs with missing FASTQ/SAM on one side as corrupt, pair type "XX"
- sort: enable lz4c compression of sorted chunks by default
- automatically convert mapq1 and mapq2 to int in
- add the
- Bugfix: include _dedup.pyx in the Python package
- First release.