Skip to content

SynNet Build

Tao edited this page Feb 19, 2021 · 4 revisions

Once we have the pep files and bed files prepared for the genomes of our interest, we can use SynetBuild to construct the synteny network. This is a bash script that runs pairwise genome comparisons (using Diamond), synteny block detection (MCScanX), and integration of synteny block information into edgelist. After this step, you will have an edgelist of the entire synteny network of the genomes you used. 

Steps:

1. Pre-install Diamond and MCScanX

2. Fill in the array with species abbreviations at Line70 accroding to the species you have.

3. Usage example: bash SynetBuild-X.sh 6 5 25 10

    the four variables are: tophits, -s, -m, -threads

    tophits (k or b): number of target sequences for each query sequence to keep from Diamond blast.

    -s: number of anchors to call a synteny block in MCScanX, default is 5, okay for angiosperms.

    -m: number of genes upstream and downstream to search for anchors, default is 25. Lower this value for a stricter detection.

    -threads: this is used for both Diamond and MCScanX. No need to set this super high, 10-30 is already quite good, depending on your CPU.

If you are interested, below is a parameter testing for mammalian (A,B) and angiosperm (C,D) genomes. 

parameter testing

One it's done, you will have a final file named like "SynNet-k5s5m25" under the folder the pipeline created. 

This file is a 4-column text file, which actually is the edgelist of the constructed network.  I often treat this as the synteny network database. The header may look like this: 

Alyr0-0b 936.0 AlyrAL1G19310 AlyrAL1G65100
Alyr0-1b 936.0 AlyrAL1G19350 AlyrAL1G65150
Alyr0-2b 936.0 AlyrAL1G19400 AlyrAL1G65290
Alyr0-3b 936.0 AlyrAL1G19420 AlyrAL1G65430
Alyr0-4b 936.0 AlyrAL1G19430 AlyrAL1G65470
......

 

In the 1st column, Alyr0 stands for the Block ID. Alyr0-0b is the first gene within this block ID. The 2rd column 936.0 is the block score. Genes from the same Block ID will have the same block score. The 3rd and 4th columns  stand for the syntenic anchor pair. 

 

Till now,  synteny network construction was done, which belong to the upstream part of the analysis. Downstream analysis and applications can be highly dynamic. For example you can..

1. Filter the edgelist of your interested genes/ gene family, and analyze the sub-networks. 

2. Cluster the entire network, and analyze the patterns of the resulting clusters.

3. Build phylogeny based on synteny clusters. 

 

We'll touch upon these in next chapters.