Majority rules consensus base calling? #87

nick-youngblut · 2023-01-31T00:17:09Z

It appears that sangeranalyseR does not allow for majority-rules base calling, which could also include "use highest quality base" for calling. The attached alignment is an example of instead calling ambiguous bases in the consensus sequence. If even one of many reads show a different base at the position, then the consensus sequence base is ambiguous (e.g., "A" on Read1 and "G" on Read1, so "R" in the consensus).

If one does not want ambiguous bases in the consensus, then one must use very strict read filtering/trimming. Another approach would be to allow for majority-rules base calling, with "majority" weighted by the chromatogram signal intensity/quality at the target position (e.g., use "A" for the consensus at the target position because Read1 has a much "better" signal than Read2, which shows "G" for that same position).

nick-youngblut · 2023-01-31T00:31:23Z

It would also greatly help to have a maxLength parameter, so that any sequences longer than this length will be trimmed. This is useful when one knows that the Sanger read will definitely be poor quality after N bases (e.g., trim all reads to a max of 600 bp).

Kuanhao-Chao self-assigned this Aug 21, 2023

Kuanhao-Chao added the enhancement label Aug 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Majority rules consensus base calling? #87

Majority rules consensus base calling? #87

nick-youngblut commented Jan 31, 2023

nick-youngblut commented Jan 31, 2023

Majority rules consensus base calling? #87

Majority rules consensus base calling? #87

Comments

nick-youngblut commented Jan 31, 2023

nick-youngblut commented Jan 31, 2023