Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-m and -M options don't make a difference #82

Open
alj1983 opened this issue Mar 22, 2018 · 7 comments
Open

-m and -M options don't make a difference #82

alj1983 opened this issue Mar 22, 2018 · 7 comments

Comments

@alj1983
Copy link

alj1983 commented Mar 22, 2018

I have mapped reads with the following commands:

  1. bowtie --phred33-quals -q -n 2 -l 10 --best --strata -y -S -k 1 -m 1 --al aligned.fastq -un unaligned.fastq Reference Query.fq > Mapping.sam

  2. bowtie --phred33-quals -q -n 2 -l 10 --best --strata -y -S -k 1 -M 1 --al aligned.fastq -un unaligned.fastq Reference Query.fq > Mapping.sam

The first command should report only unique mappings while the second command should report also reads that mapped at different locations (but only one of these locations).

However, If I count the number of reads that aligned in the aligned.fastq files, I get the exact same result for both mappings. What is here wrong?

@ch4rr0
Copy link
Collaborator

ch4rr0 commented May 29, 2019

We pushed a fix for this issue which should correct the behavior of -M as well as cause bowtie to report reads that were sampled or suppressed (-m). E.g. :

$ ./bowtie indexes/e_coli reads/e_coli_1000.fq -M1 --best -S > /dev/null
# reads processed: 1022
# reads with at least one reported alignment: 699 (68.40%)
# reads that failed to align: 301 (29.45%)
# reads with alignments sampled due to -M: 22 (2.15%)
Reported 699 alignments

$ ./bowtie indexes/e_coli reads/e_coli_1000.fq -m1 --best -S > /dev/null
# reads processed: 1000
# reads with at least one reported alignment: 677 (67.70%)
# reads that failed to align: 301 (30.10%)
# reads with alignments suppressed due to -m: 22 (2.20%)
Reported 677 alignments

If you are able to build bowtie from source, please let me know if this commit fixes this issue for you as well.

@ch4rr0
Copy link
Collaborator

ch4rr0 commented Jul 6, 2019

This change has been included in our latest version.

@karaulanov
Copy link

We pushed a fix for this issue which should correct the behavior of -M as well as cause bowtie to report reads that were sampled or suppressed (-m). E.g. :

$ ./bowtie indexes/e_coli reads/e_coli_1000.fq -M1 --best -S > /dev/null
# reads processed: 1022
# reads with at least one reported alignment: 699 (68.40%)
# reads that failed to align: 301 (29.45%)
# reads with alignments sampled due to -M: 22 (2.15%)
Reported 699 alignments

$ ./bowtie indexes/e_coli reads/e_coli_1000.fq -m1 --best -S > /dev/null
# reads processed: 1000
# reads with at least one reported alignment: 677 (67.70%)
# reads that failed to align: 301 (30.10%)
# reads with alignments suppressed due to -m: 22 (2.20%)
Reported 677 alignments

If you are able to build bowtie from source, please let me know if this commit fixes this issue for you as well.

It seems like the new Bowtie release (1.2.3) incorrectly reports the number of "reads processed" in the '-M 1' mode by counting twice the multi-mapping reads (in your example 1000 original reads become reported as 1022 reads processed) and hence all % estimates are also changing. Apart from that reporting bug (or feature), I don't find apparent differences in the actual alignments produced by Bowtie_1.2.3 versus Bowtie_1.2.2 using '-M 1' mode.

ch4rr0 added a commit that referenced this issue Aug 28, 2019
@ch4rr0
Copy link
Collaborator

ch4rr0 commented Aug 28, 2019

Thank you for picking up on this. It helped me realize that I never committed my complete changes for this issue. With all the changes in place here are the new outputs from the above commands:

$ ./bowtie-align-s indexes/e_coli reads/e_coli_1000.fq -M1 --best --sam-nohead  -S | wc -l
# reads processed: 1000
# reads with at least one reported alignment: 699 (69.90%)
# reads that failed to align: 301 (30.10%)
# reads with alignments sampled due to -M: 22 (2.20%)
Reported 699 alignments
    1000

 $ ./bowtie-align-s indexes/e_coli reads/e_coli_1000.fq -m1 --best --sam-nohead  -S | wc -l
# reads processed: 978
# reads with at least one reported alignment: 677 (69.22%)
# reads that failed to align: 301 (30.78%)
# reads with alignments suppressed due to -m: 22 (2.25%)
Reported 677 alignments
     978

I think this should be inline with your expectations.

[EDIT: updated output with the results of including --sam-nohead and -S flags]

@ch4rr0
Copy link
Collaborator

ch4rr0 commented Aug 28, 2019

There's still one bug that needs fixing: the reads processed for -m1 should still be 1000. I will look into a fix for that issue.

@ch4rr0
Copy link
Collaborator

ch4rr0 commented Aug 28, 2019

Ok I fixed the issue and changed the summary a little bit so that it, in my opinion, makes more sense.

$ ./bowtie-align-s indexes/e_coli reads/e_coli_1000.fq -m1 --best --sam-nohead  -S --threads 3 | wc -l
# reads processed: 1000
# reads with at least one alignment: 699 (69.90%)
# reads that failed to align: 301 (30.10%)
# reads with alignments suppressed due to -m: 22 (2.20%)
Reported 677 alignments
     978

$ ./bowtie-align-s indexes/e_coli reads/e_coli_1000.fq -M1 --best --sam-nohead  -S --threads 3 | wc -l
# reads processed: 1000
# reads with at least one alignment: 699 (69.90%)
# reads that failed to align: 301 (30.10%)
# reads with alignments sampled due to -M: 22 (2.20%)
Reported 699 alignments
    1000

Let me know your thoughts on this.

@karaulanov
Copy link

Thanks a lot for the quick feedback. The modified reporting looks good to me, except that the suppressed alignments are lost from the SAM file, which may cause problems in some cases. Ideally all input reads should be reported in the SAM file by default.

On a different note, probably not worth opening a special issue, I noticed that while older Bowtie versions delete all non-ACGTN bases before sequence alignment, since version 1.2.2 such bases are all converted to "A" during alignment and appear as "A" in the SAM files. This becomes relevant when, for example, one uses Bowtie "off label" to align miRNA datasets from miRBase (containing Us instead of Ts) to newly assembled genomes, resulting in spurious and misleading alignments of modified sequences. It would be good having some documentation on that issue and maybe also implement different rules, e.g. conversion of Us into Ts and other atypical bases into Ns (instead of As) plus giving some warning messages to make people aware of the modifications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants