-
Notifications
You must be signed in to change notification settings - Fork 4
5.2 tips and recs for gene cluster neighborhood visualization
Several great options exist for visualization of gene clusters/neighborhoods, a.k.a. "microsynteny plots". Here, we will highlight a few of them and how fai and zol can fit into your workflows for creating such plots. For information on using cgc
or cgcg
within the zol suite to create "collapsed" plots visualizing summary information from thousands of gene cluster instances, please check out this other wiki page.
One obvious use of prepTG & fai is the ability to easily identify homologous gene clusters. These will automatically be output into a discrete sub-directory within the fai results in GenBank format:
fai_Results/Final_Results/Homologous_Gene_Cluster_GenBanks/
Other options for finding homologous sets of gene clusters include: cblaster (and CAGECAT, a web-application for running cblaster), BiG-SCAPE / CORASON, fast.genomics, and more.
Gene clusters found can be provided as inputs for a variety of visualization software. We provide a listing here:
- clinker: clinker is one of the most popular tools here because it is very easy to install, run, and provides an interactive experience. clinker is also available on the web via CAGECAT.
# running clinker on all homologous gene cluster instances identified by fai:
clinker fai_Results/Final_Results/Homologous_Gene_Cluster_GenBanks/*.gbk
-
AnnoView: AnnoView is another great option and is available via a web-application. This is especially nice if you are on Windows and don't want to set up a UNIX-based virtual environment to use commandline tools. Users can simply upload GenBank files at: http://annoview.uwaterloo.ca/annoview/upload
-
pyGenomeViz: pyGenomeViz is a great way to do more custom visualization and is available as both a python library and interactive application. The application takes in GenBank files as input, similar to clinker and AnnoView.
-
CORASON: CORASON is also a great tool which can further provide a phylogenetic perspective for a core gene of the gene cluster in question. It also takes as input homologous gene clusters in GenBank format.
-
Easyfig: EasyFig is one of the older stand-alone tools for gene cluster visualization, offering both Windows and OSx support and a graphical user interface.
-
genoplotr: Similar to EasyFig, genoplotr is one of the classic frameworks for creating micro-synteny plots. It is an R library, so familiarity with R is required.
-
lovis4u: lovis4u is a new tool for creating gene cluster visualizations - with functional annotation and coloring supported.
-
TidyLocalSynteny: TidyLocalSynteny is an R framework for creating custom synteny figures in R.
zol features options to dereplicate input gene clusters based on ANI and coverage using skani and features adjustable options for how it performs representative gene cluster selection.
Note, skani estimates for ANI and AF become less reliable when working with contigs <10kb, so zol-based dereplication should only be used for gene clusters 10 kb or larger.
# Run zol with dereplication requested
zol -i GenBanks_Directory/ -o zol_Results/ -d
# Reference dereplicated representative GenBanks/gene clusters as input for clinker analysis
clinker zol_Results/Dereplicated_GenBanks/*.gbk -p clinker_visualization.html
zol aims to infer highly reliable protein ortholog groups between homologous/orthologous gene clusters. For some visualization software, like clinker, it is possible to define orthologous protein relations between gene cluster instances. In the future, we might create files in zol for exporting orthology relationships to use in select third party visualization software described on this page.