-
Notifications
You must be signed in to change notification settings - Fork 5
Genome size estimation
When running Autocycler subsample and Generating input assemblies, you'll need a genome size estimate. If you don't already know the size of your genome, here are two reliable methods to determine it:
-
Assemble your reads and measure the assembly size
Raven is a fast assembler well-suited for this purpose. To streamline the process, use thegenome_size_raven.sh
helper script, located in Autocycler's scripts directory. -
Use the LRGE tool
LRGE estimates genome size directly from reads and is straightforward to run without requiring a helper script.
reads=ont.fastq.gz # your read set goes here
threads=16 # set as appropriate for your system
genome_size=$(genome_size_raven.sh "$reads" "$threads")
# OR
genome_size=$(lrge -t "$threads" "$reads")
Method | Time to run | Error rate |
---|---|---|
Raven | 1-10 minutes | <3% |
LRGE | <1 minute | <10% |
Raven tends to produce more accurate results, while LRGE is faster. Both methods are accurate enough for Autocycler's requirements. For more details on genome size estimation and comparative performance, see the LRGE paper.
- Step 1: Autocycler subsample
- Step 2: Generating input assemblies
- Step 3: Autocycler compress
- Step 4: Autocycler cluster
- Step 5: Autocycler trim
- Step 6: Autocycler resolve
- Step 7: Autocycler combine