Skip to content

Genome size estimation

Ryan Wick edited this page Dec 3, 2024 · 11 revisions

When running Autocycler subsample and Generating input assemblies, you'll need a genome size estimate. If you don't already know the size of your genome, here are two reliable methods to determine it:

Methods

  1. Assemble your reads and measure the assembly size
    Raven is a fast assembler well-suited for this purpose. To streamline the process, use the genome_size_raven.sh helper script, located in Autocycler's scripts directory.

  2. Use the LRGE tool
    LRGE estimates genome size directly from reads and is straightforward to run without requiring a helper script.

Example commands

reads=ont.fastq.gz  # your read set goes here
threads=16  # set as appropriate for your system

genome_size=$(genome_size_raven.sh "$reads" "$threads")
# OR
genome_size=$(lrge -t "$threads" "$reads")

Comparison of Methods

Method Time to run Error rate
Raven 1-10 minutes <3%
LRGE <1 minute <10%

Raven tends to produce more accurate results, while LRGE is faster. Both methods are accurate enough for Autocycler's requirements. For more details on genome size estimation and comparative performance, see the LRGE paper.

Clone this wiki locally