Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Majority rules consensus base calling? #87

Open
nick-youngblut opened this issue Jan 31, 2023 · 1 comment
Open

Majority rules consensus base calling? #87

nick-youngblut opened this issue Jan 31, 2023 · 1 comment
Assignees

Comments

@nick-youngblut
Copy link

It appears that sangeranalyseR does not allow for majority-rules base calling, which could also include "use highest quality base" for calling. The attached alignment is an example of instead calling ambiguous bases in the consensus sequence. If even one of many reads show a different base at the position, then the consensus sequence base is ambiguous (e.g., "A" on Read1 and "G" on Read1, so "R" in the consensus).

If one does not want ambiguous bases in the consensus, then one must use very strict read filtering/trimming. Another approach would be to allow for majority-rules base calling, with "majority" weighted by the chromatogram signal intensity/quality at the target position (e.g., use "A" for the consensus at the target position because Read1 has a much "better" signal than Read2, which shows "G" for that same position).

Screen Shot 2023-01-30 at 4 10 48 PM

@nick-youngblut
Copy link
Author

It would also greatly help to have a maxLength parameter, so that any sequences longer than this length will be trimmed. This is useful when one knows that the Sanger read will definitely be poor quality after N bases (e.g., trim all reads to a max of 600 bp).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants