-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
different number of reads reported and in bam file #50
Comments
Hello, Looking more closely to this issue, it appears that the bam file produced by longshot also contains supplementary and secondary alignments, even though these are filtered out before SNV discovery. Filtering these out, I get 29240 unphased reads, but I should be getting 26476, so there are ~ 3 thousand additional reads. Wat else could I be missing? Cheers, Juan D. Montenegro |
Longshot does output all alignments therefore the total number of reads in the output should be exactly the same as in the input bam. The statistics are only for the filtered reads. Can you confirm if the extra reads are duplicates? |
Hello, Any help selecting the appropriate set of unphased reads would be very helpful. Cheers, Juan D. |
Longshot filters out reads with low mapping quality in addition to secondary/supp. alignments. I have copied the list of filters from the code below: record.is_quality_check_failed() All these reads will be output as 'unphased'. |
Thank you for your reply.
So once I filter all these, those that do not have the "HP" tag were not
phased because they either: 1) did not have enough variants to be phased,
2) the variants assigned were in conflict with other more abundant reads,
or 3) because the locus is actually (mostly) homozygous. Would that be
correct?
Cheers,
Juan D. Montenegro
El jue., 17 sept. 2020 a las 12:15, Bansal Lab (<[email protected]>)
escribió:
… Longshot filters out reads with low mapping quality in addition to
secondary/supp. alignments. I have copied the list of filters from the code
below:
record.is_quality_check_failed()
|| record.is_duplicate()
|| record.is_secondary()
|| record.is_unmapped()
|| record.mapq() < min_mapq
|| record.is_supplementary()
All these reads will be output as 'unphased'.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#50 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACHSLORKYEUFL5XWQQHXX4LSGI7ZXANCNFSM4Q5DFJHQ>
.
|
Hello,
I recently aligned reads from a heterozygous individual to the specie's reference genome. After identification of alignment breakpoints, I ran longshot on the conserved regions. Out of 1900 targets, 1780 were succesfully split into haplotypes. However, I hav noticed a few things:
that is 49720 reads phased and unphased, but
the bam produced by longshot contains 109628 unique reads. That does not add up. Do you know what is going on here? Or am I reading it wrong?
BTW, the number of phased reads (HP:i:1 and HP:i:2) is correct, so the problem is from the unphased reads.
Cheers,
Juan D. Montenegro
The text was updated successfully, but these errors were encountered: