You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I was trying out the most recent version of the pipeline using bakta and compared to running with prokka on a set of CRAB sequences. Similar to a previous issue I brought up a few months ago when the default aligner was changed from roary to panaroo, I found that which gene annotation was used had a significant impact on the resulting SNP matrix and interpretation.
Please find attached an excel file that includes a comparison of output matrices and core genome metrics.
Up to this point I have been using prokka and roary, so the first matrix is essentially the status quo from my point of view. To focus on one part of the matrix, S19-S23 are all within two SNPs, but fewer than 10 SNPs apart from a few others included in the analysis and not more than 51 SNPs to any other sequence.
In the second matrix (bakta/rorary). S19-S23 now looks to be split into two subclusters, and more surprising to me are now >1000 SNPs apart from all other sequences.
In the third matrix, since the default annotator/aligner is Bakta/panaroo, I ran the same analysis this way as well. Another slightly different interpretation here. S19-S23 are no longer drastically different from the others as with bakta/roary, but there are other differences such as S22 no longer clusters with S19-S21, S23.
The final matrix is generated by BugSeq’s refMLST method and appears to most closely resemble the prokka/roary matrix.
I can share the fastqs files if you are interested.
Thanks,
Wes
The text was updated successfully, but these errors were encountered:
Hello, I was trying out the most recent version of the pipeline using bakta and compared to running with prokka on a set of CRAB sequences. Similar to a previous issue I brought up a few months ago when the default aligner was changed from roary to panaroo, I found that which gene annotation was used had a significant impact on the resulting SNP matrix and interpretation.
Please find attached an excel file that includes a comparison of output matrices and core genome metrics.
Matrix Comparison.xlsx
Up to this point I have been using prokka and roary, so the first matrix is essentially the status quo from my point of view. To focus on one part of the matrix, S19-S23 are all within two SNPs, but fewer than 10 SNPs apart from a few others included in the analysis and not more than 51 SNPs to any other sequence.
In the second matrix (bakta/rorary). S19-S23 now looks to be split into two subclusters, and more surprising to me are now >1000 SNPs apart from all other sequences.
In the third matrix, since the default annotator/aligner is Bakta/panaroo, I ran the same analysis this way as well. Another slightly different interpretation here. S19-S23 are no longer drastically different from the others as with bakta/roary, but there are other differences such as S22 no longer clusters with S19-S21, S23.
The final matrix is generated by BugSeq’s refMLST method and appears to most closely resemble the prokka/roary matrix.
I can share the fastqs files if you are interested.
Thanks,
Wes
The text was updated successfully, but these errors were encountered: