- Adds support for reorienting contigs where the gene of interest spands the contig ends - fixes this issue. Thanks @marade @oschwengers.
- Specifically, this is done by rotating each contig in the input by half the genome length, then running
MMseqs2
for both the original and rotated contigs. TheMMseqs2
hit with the highest bitscore across the original and rotated contigs will be chosen as the top hit to rotate by, therefor enabling detection of partial hits (on the original contig) that span the contig ends.
- Specifically, this is done by rotating each contig in the input by half the genome length, then running
- This has only been implemented for
dnaapler all
(this should be the command used by 99% of users).
- Thanks to the inimitable @rrwick, v1.0.1 is a patch fixing a string-parsing bug.
- If your contig headers were integers,
dnaapler
did not rotate the foundBLAST/MMseqs2
hits. This was a pre-existing issue (not introduced by v1.0.0).
- BREAKING CHANGE -
dnaapler
now usesMMSeqs2
rather thanBLAST
. You will need to installMMSeqs
if you upgrade (if you use conda, it should be handled for you) - There are 2 reasons for this:
- Users reported problems installing BLAST on MacOS with Apple Silicon (see e.g. here). MMseqs works on all platforms and is dilligently maintained.
- MMSeqs2 is much much faster than BLAST (what took BLAST a few minutes takes MMSeqs2 seconds). We should have written
dnaapler
withMMseqs2
to begin with.
- The alignment resuls may not be identicial (i.e. they might find specifically different top hits), but the actual reorientation is likely to be identical (at least in my tests). Please reach out or make an issue if you notice any discrepancies.
For example - on my machine (Ubuntu 20.04, Intel i9 13th gen 13900 CPU with 32 threads), for a Staphylococcus aureus genome with 1 small plasmid, dnaapler -i staph.fasta -o staph_dnaapler -t 8
took ~129 seconds wallclock with v0.8.1
using BLAST
, while it took ~3 seconds wallclock with v1.0.0
using MMseqs2
.
- Minor release - adds
--db dnaa,repa,cog1474
as an option fordnaapler all
to allow for archaea orientation in hybracter
- Adds
dnaapler archaea
and adds archaeal reorientation functionality intodnaapler all
- Specifically, this uses 403 COG1474 genes COG1474
- Relaxes (to warning) where no BLAST hits are found - pipleine will still complete (requested in a number of issues #74 #76 #77)
- Adds
-c/--custom_db
withdnaapler all
to allow specifying custom databases withdnaapler all
.
- Fixes bug where if the starting gene (dnaA/terL/repA) was on the reverse strand and the top BLAST hit did not find the start codon, it would reorient the replicon to begin at the end of the starting gene, not the start. Thanks @susiegriggo
- Bumps version to include updated citation
- With
dnaapler all
, adds the reoriented gene to the header (thanks @ammaraziz #67) - Adds
--db
parameter todnaapler all
allowing specifying a subset of genes to make up the database. In particular, if you have bacteria and plasmids,--db dnaa,repa
should speed up Dnaapler's runtime quite a bit (thanks @oschwengers #63)
- JOSS release with minor typos and bug fixes from v0.4
- Implemented a modification to the logic for all cases where the top blastx hit alignment does not begin with a start codon. In this case, dnaapler will find the CDS according the pyrodigal that has the most overlap with the top hit alignment. Thanks @simone-pignotti for this suggestion here.
- Changes
dnaapler all
output FASTA to_reoriented.fasta
instead of_all_reoriented.fasta
for consistency with all other commands (exceptdnaapler bulk
). - Adds
-a
or--autocomplete
option withdnaapler all
. - Adds
dnaapler largest
and-a largest
as an option to orient your sequence beginning with the largest
- Changes
Orffinder
toGenefinder
to supportpyrodigal
v3. - Updates dependency to
pyrodigal >=v3
.
- Minor release to fix an error with dnaapler all #38 thanks @samnooij
dnaapler all
subcommand added thanks @alexweisbergdnaapler all
implements--ignore
to ignore some contigs
dnaapler nearest
subcommand addeddnaapler bulk
subcommand added- dnaA database filtered to keep only bona-file dnaA genes (i.e. GN=dnaA)
- Adds
-e
parameter to vary BLAST evalue if desired - Adds
-a
autocomplete parameter if user wants to reorient sequences with mystery or nearest methods in case the BLAST based method fails
- Completely overhauled
- First stable released with pypi and conda
dnaapler chromosome
addeddnaapler custom
addeddnaapler mystery
addeddnaapler phage
addeddnaapler plasmid
added
- First release (conda only
conda install -c gbouras dnaapler
)