-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Bug fix: correct sorting for bedtools intersect when applying blackli…
…st mask Background - To reduce memory usage when applying the blacklist mask (rule repeat_mask), the option -sorted was added to the bedtools intersect command in commit 2703961. - bedtools intersect with the -sorted option requires that all input files be position-sorted. Problem: the input files were sorted, but not in the way bedtools intersect expected them to be - Input 1 = chromatin alignment BAM file - This is produced by Bowtie 2 followed by samtools sort (rule bowtie2_align), then the chromosomes are renamed in the rule rename_and_filter_chr. - The chromosome renaming step is essential to ensure that the chromosome names in the BAM file (which originally come from the Bowtie 2 index) match the chromosome names in the blacklist mask file. - This BAM file is coordinate sorted using the order of chromosomes in the BAM header, which uses the order of chromosomes in the chromosome name map file. Importantly, this order is user-defined and is not necessarily lexicographically ordered! - Input 2 = blacklist mask file - In the original upstream merge_mask rule, this mask was coordinate sorted using sort -k1,1 -k2,2n, which will lexicographically order the chromosome names. Fix: The fix consisted of 2 changes 1. Use the same chromosome sorting order for the BAM alignment file as the mask. Specifically, sort the mask file using the order of chromosomes in the chromosome name map file. 2. bedtools intersect with the -sorted flag alone appears to expect chromosomes sorted lexicographically. Consequently, a "genome file" (bedtools terminology) needs to be passed using the -g option that specifies the order of chromosomes desired from the chromosome name map. See also issues 1116 and 1117 on the bedtools GitHub repository (arq5x/bedtools2#1116, arq5x/bedtools2#1117). Other changes - Removed documentation in README.md and code comments regarding requirement for Python 3.9+. - Commit 85f3574 removed the use of the Python 3.9+-specific dict | dict operation in rename_and_filter_chr.py:reheader() - helpers.py:parse_chrom_map() now uses file_open() instead of open() for reading a chromosome name map file
- Loading branch information
Showing
4 changed files
with
139 additions
and
27 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters