Releases: Ecogenomics/GTDBTk
Releases · Ecogenomics/GTDBTk
2.2.0
2.2.0
Minor changes:
- (#433) Added additional checks to ensure that the
--outgroup_taxon
cannot be set to a domain (root
,de_novo_wf
). - (#459/ #462 ) Fix deprecated np.bool in prodigal_biolib.py. Special thanks to @neoformit for his contribution.
- (#466 ) RED value has been rounded to 5 decimals after the comma.
- (#451 ) Extra checks have been added when Prodigal fails.
- (#448) Warning has been added when all the genomes are filtered out and not classified.
Bug Fixes:
- (#420 ) Fixed an issue where GTDB-Tk might hang when classifying TIGRFAM markers (
identify
,classify_wf
,de_novo_wf
). Special thanks to @lfenske-93 and @sjaenick for their contribution. - (#428) Fixed an issue where the
--gtdbtk_classification_file
would raise an error trying to read theclassify
summary (root
,de_novo_wf
). - (#439) Fix the pipeline when using protein files instead of nucleotide files. symlink uses absolute path instead.
2.1.1
2.1.0
Major changes:
- GTDB-TK now uses a divide-and-conquer approach where the bacterial reference tree is split into multiple class-level subtrees. This reduces the memory requirements of GTDB-Tk from 320 GB of RAM when using the full GTDB R07-RS207 reference tree to approximately 55 GB. A manuscript describing this approach is in preparation. If you wish to continue using the full GTDB reference tree use the
--full-tree
flag. This is the main change from v2.0.0. The split tree approach has been modified from order-level trees to class-level trees to resolve specific classification issues (see #383). - Genomes that cannot be assigned to a domain (e.g. genomes with no bacterial or archaeal markers or genomes with no genes called by Prodigal) are now reported in the
gtdbtk.bac120.summary.tsv
as 'Unclassified' - Genomes filtered out during the alignment step are now reported in the
gtdbtk.bac120.summary.tsv
orgtdbtk.ar53.summary.tsv
as 'Unclassified Bacteria/Archaea' --write_single_copy_genes
flag in now available in theclassify_wf
andde_novo_wf
workflows.
Features:
- (#392)
--write_single_copy_genes
flag available in workflows. - (#387) specific memory requirements set in classify_wf depending on the classification approach.
Important
This version is not backwards compatible with GTDB package R207 v1.
This version requires a new reference package
2.0.0
Major changes:
- GTDB-TK now uses a divide-and-conquer approach where the bacterial reference tree is split into multiple order-level subtrees. This reduces the memory requirements of GTDB-Tk from 320 GB of RAM when using the full GTDB R07-RS207 reference tree to approximately 35 GB. A manuscript describing this approach is in preparation. If you wish to continue using the full GTDB reference tree use the
--full-tree
flag. - Archaeal classification now uses a refined set of 53 archaeal-specific marker genes based on the recent publication by Dombrowski et al., 2020. This set of archaeal marker genes is now used by GTDB for curating the archaeal taxonomy.
- By default, all directories containing intermediate results are now removed by default at the end of the
classify_wf
andde_novo_wf
pipelines. If you wish to retain these intermediates files use the--keep-intermediates
flag. - All MSA files produced by the
align
step are now compressed with gzip. - The classification summary and failed genomes files are now the only files linked in the root directory of
classify_wf
.
Features:
convert_to_itol
to convert trees into iTOL format (#373)- Output FASTA files are compressed by default (#369)
- Intermediate files will be removed by default when using classify/de-novo workflows unless specified by
--keep_intermediates
(#369) - Add --genes flag for Error (#362)
- A warning will be displayed if pplacer fails to place a genome (#360 / #356)
Important
- This version is not backwards compatible with GTDB release 202.
- This version requires a new reference package
1.7.0
- (#336) Warn the user if they have provided an incorrectly formatted taxonomy file.
- (#348) Gracefully exit the program if no single copy hits could be identified.
- (#351) Fixed an issue where GTDB-Tk would crash if spaces were present in the reference data path.
- (#354) Added optional --tmpdir argument to set temporary directory (thanks @tr11-sanger ).
1.6.0
1.5.1
1.5.0
Changes:
- Updated to use PFAM 33.1 markers.
- Updated to use GTDB R202 taxonomy (note, this will require an update to the reference package https://ecogenomics.github.io/GTDBTk/installing/index.html#gtdb-tk-reference-data)
Fixes:
1.4.1
1.4.0
- Check if stdout is being piped to a file before adding colour.
- (#283) Significantly improved classify performance (noticeable when running trees > 1,000 taxa).
- Automatically cap pplacer CPUs to 64 unless specifying
--pplacer_cpus
to prevent pplacer from hanging. - (#262) Added
--write_single_copy_genes
to the identify command. Writes unaligned single-copy AR122/BAC120 marker genes to disk. - When running -version warn if GTDB-Tk is not running the most up-to-date version (disable via
GTDBTK_VER_CHECK = False
in config.py). If GTDB-Tk encounters an error it will silently continue (3 second timeout). - (#276) Renamed the column
aa_percent
tomsa_percent
in summary.tsv (produced by classify). - (#286) Fixed a file not found error when the reference data is a symbolic link (thanks davidealbanese!).
- (#277) Fixed an issue where if the user overrides the translation table using the optional 3rd column in the batchfile, the other coding density would appear as -100. Both translation table densities are now reported.
- The check_install command now also checks that all third party binaries can be found on the system path.
- The align step is now approximately 10x faster.
- (#289) Added
--min_af
to classify and classify_wf which allows the user to specify the minimum alignment fraction for FastANI. - Added the
--mash_db
command to re-use the GTDB-Tk Mash reference database in ani_rep.