Skip to content

Releases: Ecogenomics/GTDBTk

2.2.0

14 Feb 00:33
3d7e936
Compare
Choose a tag to compare

2.2.0

Minor changes:

  • (#433) Added additional checks to ensure that the --outgroup_taxon cannot be set to a domain (root, de_novo_wf).
  • (#459/ #462 ) Fix deprecated np.bool in prodigal_biolib.py. Special thanks to @neoformit for his contribution.
  • (#466 ) RED value has been rounded to 5 decimals after the comma.
  • (#451 ) Extra checks have been added when Prodigal fails.
  • (#448) Warning has been added when all the genomes are filtered out and not classified.

Bug Fixes:

  • (#420 ) Fixed an issue where GTDB-Tk might hang when classifying TIGRFAM markers (identify, classify_wf, de_novo_wf). Special thanks to @lfenske-93 and @sjaenick for their contribution.
  • (#428) Fixed an issue where the --gtdbtk_classification_file would raise an error trying to read the classify summary (root, de_novo_wf).
  • (#439) Fix the pipeline when using protein files instead of nucleotide files. symlink uses absolute path instead.

2.1.1

11 Jul 05:11
3656d71
Compare
Choose a tag to compare
  • (#399) Fix --genes options
  • (#400) Modify config.py file to resolve this issue
  • Updated documentation ( including #410 , documentation for itol)

2.1.0

12 May 03:23
Compare
Choose a tag to compare

Major changes:

  • GTDB-TK now uses a divide-and-conquer approach where the bacterial reference tree is split into multiple class-level subtrees. This reduces the memory requirements of GTDB-Tk from 320 GB of RAM when using the full GTDB R07-RS207 reference tree to approximately 55 GB. A manuscript describing this approach is in preparation. If you wish to continue using the full GTDB reference tree use the --full-tree flag. This is the main change from v2.0.0. The split tree approach has been modified from order-level trees to class-level trees to resolve specific classification issues (see #383).
  • Genomes that cannot be assigned to a domain (e.g. genomes with no bacterial or archaeal markers or genomes with no genes called by Prodigal) are now reported in the gtdbtk.bac120.summary.tsv as 'Unclassified'
  • Genomes filtered out during the alignment step are now reported in the gtdbtk.bac120.summary.tsv or gtdbtk.ar53.summary.tsv as 'Unclassified Bacteria/Archaea'
  • --write_single_copy_genes flag in now available in the classify_wf and de_novo_wf workflows.

Features:

  • (#392) --write_single_copy_genes flag available in workflows.
  • (#387) specific memory requirements set in classify_wf depending on the classification approach.

Important

This version is not backwards compatible with GTDB package R207 v1.
This version requires a new reference package

2.0.0

08 Apr 01:54
Compare
Choose a tag to compare

Major changes:

  • GTDB-TK now uses a divide-and-conquer approach where the bacterial reference tree is split into multiple order-level subtrees. This reduces the memory requirements of GTDB-Tk from 320 GB of RAM when using the full GTDB R07-RS207 reference tree to approximately 35 GB. A manuscript describing this approach is in preparation. If you wish to continue using the full GTDB reference tree use the --full-tree flag.
  • Archaeal classification now uses a refined set of 53 archaeal-specific marker genes based on the recent publication by Dombrowski et al., 2020. This set of archaeal marker genes is now used by GTDB for curating the archaeal taxonomy.
  • By default, all directories containing intermediate results are now removed by default at the end of the classify_wf and de_novo_wf pipelines. If you wish to retain these intermediates files use the --keep-intermediates flag.
  • All MSA files produced by the align step are now compressed with gzip.
  • The classification summary and failed genomes files are now the only files linked in the root directory of classify_wf.

Features:

  • convert_to_itol to convert trees into iTOL format (#373)
  • Output FASTA files are compressed by default (#369)
  • Intermediate files will be removed by default when using classify/de-novo workflows unless specified by --keep_intermediates (#369)
  • Add --genes flag for Error (#362)
  • A warning will be displayed if pplacer fails to place a genome (#360 / #356)

Important

  • This version is not backwards compatible with GTDB release 202.
  • This version requires a new reference package

1.7.0

15 Oct 03:17
eaccdd6
Compare
Choose a tag to compare
  • (#336) Warn the user if they have provided an incorrectly formatted taxonomy file.
  • (#348) Gracefully exit the program if no single copy hits could be identified.
  • (#351) Fixed an issue where GTDB-Tk would crash if spaces were present in the reference data path.
  • (#354) Added optional --tmpdir argument to set temporary directory (thanks @tr11-sanger ).

1.6.0

20 Aug 00:20
80a4801
Compare
Choose a tag to compare
  • (#337) Set minimum tqdm version to 4.35.0
  • (#335) Fixed typo in output log messages (@fplaza)
  • Removed the option to re-calculate RED values (–recalculate_red)

1.5.1

24 Jun 06:24
d18e4c2
Compare
Choose a tag to compare

Changelog:

  • #327 Disallow spaces in genome names/file paths due to downstream application issues.
  • #326 Disallow genome names that are blank.

1.5.0

26 Apr 21:54
f678ca6
Compare
Choose a tag to compare

Changes:

Fixes:

  • Automatic drop of genome leads to error in downstream modules of classify_wf (#312)
  • --scratch_dir not working in v 1.4.1 (#311)

1.4.1

03 Feb 23:56
94b855e
Compare
Choose a tag to compare
  • Updated GitHub CI/CD to trigger docker build / tag version on release.
  • (#255) (#297) Fixed 'Namespace' object has no attribute errors by adding default arguments to argparse.

1.4.0

30 Nov 23:05
Compare
Choose a tag to compare
  • Check if stdout is being piped to a file before adding colour.
  • (#283) Significantly improved classify performance (noticeable when running trees > 1,000 taxa).
  • Automatically cap pplacer CPUs to 64 unless specifying --pplacer_cpus to prevent pplacer from hanging.
  • (#262) Added --write_single_copy_genes to the identify command. Writes unaligned single-copy AR122/BAC120 marker genes to disk.
  • When running -version warn if GTDB-Tk is not running the most up-to-date version (disable via GTDBTK_VER_CHECK = False in config.py). If GTDB-Tk encounters an error it will silently continue (3 second timeout).
  • (#276) Renamed the column aa_percent to msa_percent in summary.tsv (produced by classify).
  • (#286) Fixed a file not found error when the reference data is a symbolic link (thanks davidealbanese!).
  • (#277) Fixed an issue where if the user overrides the translation table using the optional 3rd column in the batchfile, the other coding density would appear as -100. Both translation table densities are now reported.
  • The check_install command now also checks that all third party binaries can be found on the system path.
  • The align step is now approximately 10x faster.
  • (#289) Added --min_af to classify and classify_wf which allows the user to specify the minimum alignment fraction for FastANI.
  • Added the --mash_db command to re-use the GTDB-Tk Mash reference database in ani_rep.