Skip to content

ebp-nor/GenomeAnnotation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 

Repository files navigation

GenomeAnnotation

This repository contains the EBP-Nor genome annotation pipeline. It is implemented in Snakemake and set up to run on our infrastructure using SLURM as a job manager. As it is written currently, it is quite tightly linked to the specific environment, but we hope to untangle this over time.

The workflow is as follows: For each genome assembly, we map three different sets of proteins. The first is from a model species of that particular group or lineage, for instance human for mammals. We create the model protein dataset by running agat_sp_keep_longest_isoform.pl and agat_sp_extract_sequences.pl (both from AGAT) were used on the genome assembly and annotation to generate one protein (the longest isoform) per gene. The second set is a release UniProtKB/Swiss-Prot , and the third is the relevant part of OrthoDB v11. These three datasets are aligned separately to the genome assembly using miniprot. To mask repeats we are using Red via redmask. GALBA is used with the model proteins and the masked assembly. To combine these different alignments and predicted genes, we use EvidenceModeler but via funannotate-runEVM.py script from Funannotate (trying to not reinvent the wheel). The resulting predicted proteins are compared to the protein repeats that Funannotate distributes using DIAMOND blastp and the predicted genes are filtered based on this comparison using AGAT). The filtered proteins are compared to the UniProtKB/Swiss-Prot release using DIAMOND blastp to find gene names and InterProScan is used to discover functional domains. AGATs agat_sp_manage_functional_annotation.pl is used to attach the gene names and functional annotations to the predicted genes. EMBLmyGFF3 is used to combine the fasta files and GFF3 files into a EMBL format for submission to ENA.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published