Skip to content

Commit

Permalink
account for the case when FASTQ is missing on one side; introduce the…
Browse files Browse the repository at this point in the history
… "XX" corrupt pair type
  • Loading branch information
golobor committed Apr 23, 2019
1 parent cb4b3db commit 1579c6f
Show file tree
Hide file tree
Showing 3 changed files with 18 additions and 6 deletions.
14 changes: 9 additions & 5 deletions doc/formats.rst
Original file line number Diff line number Diff line change
Expand Up @@ -81,22 +81,26 @@ and which side has a "better" alignment and find the corresponding row in the ta
>2 alignments Mapped Unique Mapped Unique Pair type Code Sidedness
|check| |cross| |cross| |cross| |cross| walk-walk WW 0 [1]_
|cross| |cross| |cross| null NN 0
|cross| |cross| |cross| corrupt XX 0 [2]_
|cross| |cross| |check| |cross| null-multi NM 0
|check| |cross| |check| |check| null-rescued NR 1 [2]_
|check| |cross| |check| |check| null-rescued NR 1 [3]_
|cross| |cross| |check| |check| null-unique NU 1
|cross| |check| |cross| |check| |cross| multi-multi MM 0
|check| |check| |cross| |check| |check| multi-rescued MR 1 [2]_
|check| |check| |cross| |check| |check| multi-rescued MR 1 [3]_
|cross| |check| |cross| |check| |check| multi-unique MU 1
|check| |check| |check| |check| |check| rescued-unique RU 2 [2]_
|check| |check| |check| |check| |check| unique-rescued UR 2 [2]_
|check| |check| |check| |check| |check| rescued-unique RU 2 [3]_
|check| |check| |check| |check| |check| unique-rescued UR 2 [3]_
|cross| |check| |check| |check| |check| unique-unique UU 2
|cross| |check| |check| |check| |check| duplicate DD 2 [3]_
|cross| |check| |check| |check| |check| duplicate DD 2 [4]_
=============== ========= ================== ========= ================== ======================== ====== ===========

.. [1] "walks", or, `C-walks <https://www.nature.com/articles/nature20158>`_ are
Hi-C molecules formed via multiple ligation events which cannot be reported
as a single pair.
.. [2] "corrupt" pairs are those with technical issues - e.g. missing a
FASTQ sequence/SAM entry from one side of the molecule.
.. [2] "rescued" pairs have two non-overlapping alignments on one of the sides
(referred below as the chimeric side/read), but the inner (3'-) one extends the
only alignment on the other side (referred as the non-chimeric side/read).
Expand Down
9 changes: 8 additions & 1 deletion pairtools/pairtools_parse.py
Original file line number Diff line number Diff line change
Expand Up @@ -298,7 +298,6 @@ def empty_alignment():
'clip5_ref': 0,
'read_len': 0,
'type':'N'

}


Expand Down Expand Up @@ -549,6 +548,14 @@ def parse_sams_into_pair(sams1,
"""

# Check if there is at least one SAM entry per side:
if (len(sams1) == 0) or (len(sams2) == 0):
algns1 = [empty_alignment()]
algns2 = [empty_alignment()]
algns1[0]['type'] = 'X'
algns2[0]['type'] = 'X'
return algns1[0], algns2[0], algns1, algns2

# Generate a sorted, gap-filled list of all alignments
algns1 = [parse_algn(sam.rstrip().split('\t'), min_mapq,
report_3_alignment_end, sam_tags, store_seq)
Expand Down
1 change: 1 addition & 0 deletions tests/data/mock.sam
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,4 @@ readid21 141 * 0 0 * chr1 10 0 SEQ PHRED FLAG1 FLAG2 SIMULATED:!,0,!,0,-,-,WW
readid22 65 chr1 10 60 25M25S chr1 200 0 SEQ PHRED FLAG1 SA:Z:chr1,5300,-,25M25H,60,0; SIMULATED:!,0,!,0,-,-,WW
readid22 2129 chr1 5300 60 25M25H chr1 200 0 SEQ PHRED FLAG1 SA:Z:chr1,10,+,25M25S,60,0; SIMULATED:!,0,!,0,-,-,WW
readid22 129 chr1 200 0 50M chr1 10 0 SEQ PHRED FLAG1 FLAG2 SIMULATED:!,0,!,0,-,-,WW
readid23 129 chr1 200 0 50M chr1 10 0 SEQ PHRED FLAG1 FLAG2 SIMULATED:!,0,!,0,-,-,XX

0 comments on commit 1579c6f

Please sign in to comment.