-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Panaroo vs. Roary #188
Comments
Roary is no longer maintained. (https://github.com/sanger-pathogens/Roary). I ran into many issues with Roary that several other people had also run into, such as having samples randomly being dropped from the analysis. When it came time to update Grandeur, I knew I needed to replace Roary with something currently maintained. The options I chose to experiment with were: Admittedly, I didn't have a lot of time for this comparison. So in addition to my sample set (a cluster of seven Serratia samples), I relied on the literature. The two most influential papers were file:///Users/eriny/Downloads/s13059-020-02090-4.pdf and https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000690. All of these tools tout themselves as being a roary replacement, but they are all slightly different from roary and from each other. Here's a figure that may or may not be helpful: In summary, for core genome analysis, panaroo does identify more pairwise SNPs that Roary. The default arguments for panaroo in Grandeur are This unfortunately, is going to be a similar story with both ppanggolin and pirate as well. Panaroo was used for Grandeur due to its results being basically the same in my sample set and ease of use. I had issues running pirate with singularity (perl issues ftw!), and ppanggolin is much more complicated to use. I think there is a way to adjust panaroo to run more like roary (https://gthlab.au/panaroo/#/gettingstarted/params), but the sample set that I was using doesn't look like a good fit for this. Are the samples in your comparison on the SRA? It looks like it would be perfect for fine tuning some arguments. |
Hi Erin, Thanks for providing those details. |
I have good news!
Yay?
These results were from using the contigs.
This designates 3612 "persistent" genes, which is more than both roary and panaroo. The SNPs identified, though, were much smaller. Suspiciously so.
|
Now for some bad news. I ran your samples by adjusting several of panaroo's parameters listed at https://gtonkinhill.github.io/panaroo/#/gettingstarted/params I didn't see a big difference in the core genome size. When I looked at the results from snp-dists, I don't see a real difference there either. The SNP matrices of my exploration today.
|
Looks like I can't use bugseq, so I can't compare refmlst. It looks like it uses minimap2 to identify genes of interest and it maps reads onto those genes. |
Thanks for running all those tests, but still not a clear picture of what is going on here. |
My apologies for my late response. I've copied the Roary uses blast to see if genes are the same between organisms. Panaroo uses CD-hit. As such, they determine different genes are shared between the samples. For example, Roary determines The gene presence/absence files: I have not found a combination of parameters for panaroo to adjust it such that it matches that of roary. Pathogen Surveillance uses PIRATE (https://github.com/nf-core/pathogensurveillance/tree/dev) which is also blast based, so this workflow might be helpful to you. I would like to add PIRATE to my comparison, but I have not gotten to work with my local environment (perl issues with singularity). The next version of grandeur (#191) will support using roary instead of panaroo by setting the param.aligner to roary (--aligner roary on the command line) so you can continue to use roary. |
I think bactopia also has a pirate option for pangenome analysis too. |
Also, I JUST FOUND THIS PAPER/TOOL (https://github.com/maxgmarin/panqc). This should add some additional QC to panaroo. I'm going to try this tomorrow (or sometime soon). |
Hello,
I went back through an old cluster analysis to see what differences in interpretation would potentially arise in using with an older grandeur version that uses roary versus newer versions that are now using panaroo.
Please find a summary excel file with this comparison:
Grandeur_Version_Matrix_Comparisons_Issue.xlsx
This comparison includes sequences that were part of a 2017 P. mirabilis outbreak. Grandeur with roary identified 0-3 SNPs among these outbreak sequences. With other versions of grandeur that use panaroo, two sequences seem to form their own subcluster and differ by >200 SNPs from the other members of the outbreak cluster. Surprisingly, 2017-A and 2017-D are from the same sample but again with panaroo differ by hundreds of SNPs. Also, included is a third-party refMLST analysis, which generally agrees with the roary SNP matrix as far as interpretation.
The text was updated successfully, but these errors were encountered: