Bakta Prokka SNP Comparison #257

whottel · 2025-02-12T20:11:14Z

Hello, I was trying out the most recent version of the pipeline using bakta and compared to running with prokka on a set of CRAB sequences. Similar to a previous issue I brought up a few months ago when the default aligner was changed from roary to panaroo, I found that which gene annotation was used had a significant impact on the resulting SNP matrix and interpretation.

Please find attached an excel file that includes a comparison of output matrices and core genome metrics.

Matrix Comparison.xlsx

Up to this point I have been using prokka and roary, so the first matrix is essentially the status quo from my point of view. To focus on one part of the matrix, S19-S23 are all within two SNPs, but fewer than 10 SNPs apart from a few others included in the analysis and not more than 51 SNPs to any other sequence.

In the second matrix (bakta/rorary). S19-S23 now looks to be split into two subclusters, and more surprising to me are now >1000 SNPs apart from all other sequences.

In the third matrix, since the default annotator/aligner is Bakta/panaroo, I ran the same analysis this way as well. Another slightly different interpretation here. S19-S23 are no longer drastically different from the others as with bakta/roary, but there are other differences such as S22 no longer clusters with S19-S21, S23.

The final matrix is generated by BugSeq’s refMLST method and appears to most closely resemble the prokka/roary matrix.

I can share the fastqs files if you are interested.

Thanks,
Wes

erinyoung · 2025-02-12T20:48:42Z

I think we should write a paper together.

I also want to compare using annotations from pgap in addition to prokka and roary.

For core genome comparison, I'd like to add in pirate, ppanggolin, ksnp4, poppunk, fastani, skani, and mash.

We could add in bugseq too.

And then I really want to throw a wrench into the analysis by using only the chromosomal sequence.

The focus would be on the utility of these tools for public health outbreak investigations. (Something that expands on https://pubmed.ncbi.nlm.nih.gov/31682222/)

Wanna collaborate?!?!?!?

whottel · 2025-02-12T20:51:42Z

That sounds great!
I would need to get the okay from my lab leadership especially if we want to include BugSeq.

whottel · 2025-02-21T16:50:10Z

Hi Erin,

In case you did not see my email I sent to your utah.gov address, could we set up a call to discuss this collaboration.

Thanks,
Wes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bakta Prokka SNP Comparison #257

Bakta Prokka SNP Comparison #257

whottel commented Feb 12, 2025

erinyoung commented Feb 12, 2025

whottel commented Feb 12, 2025

whottel commented Feb 21, 2025

Bakta Prokka SNP Comparison #257

Bakta Prokka SNP Comparison #257

Comments

whottel commented Feb 12, 2025

erinyoung commented Feb 12, 2025

whottel commented Feb 12, 2025

whottel commented Feb 21, 2025