forked from sahammond/gnavigator
-
Notifications
You must be signed in to change notification settings - Fork 1
Use cDNA alignments with or without genetic map information to evaluate the completeness and correctness of a de novo genome or transcriptome assembly.
License
lcoombe/gnavigator
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
# gnavigator Gnavigator uses GMAP to align cDNAs to an assembly and evaluates the results. Alignments are categorized based on their percent identity, percent coverage, and whether or not the cDNA aligned to a single scaffold in the assembly. cDNA Alignment Categories Complete: 95% of sequence is aligned with 95% identity Duplicated: as Complete, but at multiple locations Partial: less than 95% of sequence is aligned to a single scaffold, other 5% unaligned Fragmented: 95% of a sequence is aligned, but over multiple scaffolds Poorly mapped: as complete, partial, or fragmented, but with less than 95% identity Missing: not aligned by gmap If a genetic map is provided, scaffolds with two or more aligned cDNAs that are classified as "Complete" are further assessed. The genetic map must be a tab-separated file with three columns, like so: linkage group position in cM cDNA ID LG01 0.000 cDNA1 LG01 0.010 cDNA2 ... ... ... LG14 153.10 cDNA144 Genetic Map Categories Same LG, right order: all Complete cDNAs are from the same linkage group and their positions on the scaffold are concordant with the genetic map Same LG, wrong order: all Complete cDNAs are from the same linkage group but their positions on the scaffold do not agree with the genetic map. This may indicate a misassembly or a lineage-specific genomic rearrangement, depending on context. Different LG: the Complete cDNAs that aligned to this scaffold are from different linkage groups. This may indicate a misassembly or a lineage-specific genomic rearrangement, depending on context. usage: gnavigator.py [-h] [-r] [-p PREFIX] [-d DB_DIR] [-n DB_NAME] [-t THREADS] [-m GENETIC_MAP] [-i IDENTITY] [-c COVERAGE] cDNA genome Assess assembly quality & completeness using cDNA sequences positional arguments: cDNA FASTA file of cDNA sequences to align to assembly genome FASTA file of genome assembly to assess optional arguments: -h, --help show this help message and exit -r, --transcriptome Transcriptome assessment mode. See manual for details. [off] -p PREFIX, --prefix PREFIX Prefix to use for intermediate and output files [gnavigator] -d DB_DIR, --db_dir DB_DIR Path to directory containing prebuilt GMAP index [optional] -n DB_NAME, --db_name DB_NAME Name of prebuilt GMAP index [optional] -t THREADS, --threads THREADS Number of threads for GMAP alignment [1] -m GENETIC_MAP, --genetic_map GENETIC_MAP Genetic map file as tsv with LG:cDNA pairs [optional] -i IDENTITY, --identity IDENTITY Minimum identity threshold [0.95] -c COVERAGE, --coverage COVERAGE Minimum coverage threshold [0.95] Other notes Gnavigator will use the first GMAP executable on your PATH. If you would prefer a different executable, you may specify it in "gmap_config.txt".
About
Use cDNA alignments with or without genetic map information to evaluate the completeness and correctness of a de novo genome or transcriptome assembly.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Languages
- Python 95.5%
- Shell 4.5%