Skip to content

Use cDNA alignments with or without genetic map information to evaluate the completeness and correctness of a de novo genome or transcriptome assembly.

License

Notifications You must be signed in to change notification settings

lcoombe/gnavigator

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

# gnavigator
Gnavigator uses GMAP to align cDNAs to an assembly and evaluates the results.
Alignments are categorized based on their percent identity, percent coverage,
and whether or not the cDNA aligned to a single scaffold in the assembly.

cDNA Alignment Categories
Complete: 95% of sequence is aligned with 95% identity
Duplicated: as Complete, but at multiple locations
Partial: less than 95% of sequence is aligned to a single scaffold, other 5% unaligned
Fragmented: 95% of a sequence is aligned, but over multiple scaffolds
Poorly mapped: as complete, partial, or fragmented, but with less than 95% identity
Missing: not aligned by gmap

If a genetic map is provided, scaffolds with two or more aligned cDNAs that
are classified as "Complete" are further assessed. The genetic map must be a tab-separated
file with three columns, like so:
linkage group	position in cM	cDNA ID
LG01	0.000	cDNA1
LG01	0.010	cDNA2
...	...	...
LG14	153.10	cDNA144

Genetic Map Categories
Same LG, right order: all Complete cDNAs are from the same linkage group and their positions
    on the scaffold are concordant with the genetic map
Same LG, wrong order: all Complete cDNAs are from the same linkage group but their positions
    on the scaffold do not agree with the genetic map. This may indicate a misassembly or a
    lineage-specific genomic rearrangement, depending on context.
Different LG: the Complete cDNAs that aligned to this scaffold are from different linkage groups.
    This may indicate a misassembly or a lineage-specific genomic rearrangement, depending on context.

usage: gnavigator.py [-h] [-r] [-p PREFIX] [-d DB_DIR] [-n DB_NAME]
                     [-t THREADS] [-m GENETIC_MAP] [-i IDENTITY] [-c COVERAGE]
                     cDNA genome

Assess assembly quality & completeness using cDNA sequences

positional arguments:
  cDNA                  FASTA file of cDNA sequences to align to assembly
  genome                FASTA file of genome assembly to assess

optional arguments:
  -h, --help            show this help message and exit
  -r, --transcriptome   Transcriptome assessment mode. See manual for details.
                        [off]
  -p PREFIX, --prefix PREFIX
                        Prefix to use for intermediate and output files
                        [gnavigator]
  -d DB_DIR, --db_dir DB_DIR
                        Path to directory containing prebuilt GMAP index
                        [optional]
  -n DB_NAME, --db_name DB_NAME
                        Name of prebuilt GMAP index [optional]
  -t THREADS, --threads THREADS
                        Number of threads for GMAP alignment [1]
  -m GENETIC_MAP, --genetic_map GENETIC_MAP
                        Genetic map file as tsv with LG:cDNA pairs [optional]
  -i IDENTITY, --identity IDENTITY
                        Minimum identity threshold [0.95]
  -c COVERAGE, --coverage COVERAGE
                        Minimum coverage threshold [0.95]

Other notes
Gnavigator will use the first GMAP executable on your PATH. If you would prefer a different
executable, you may specify it in "gmap_config.txt".

About

Use cDNA alignments with or without genetic map information to evaluate the completeness and correctness of a de novo genome or transcriptome assembly.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 95.5%
  • Shell 4.5%