Releases · Ecogenomics/GTDBTk

14 Feb 00:33

pchaumeil

2.2.0

3d7e936

2.2.0

Minor changes:

(#433) Added additional checks to ensure that the --outgroup_taxon cannot be set to a domain (root, de_novo_wf).
(#459/ #462 ) Fix deprecated np.bool in prodigal_biolib.py. Special thanks to @neoformit for his contribution.
(#466 ) RED value has been rounded to 5 decimals after the comma.
(#451 ) Extra checks have been added when Prodigal fails.
(#448) Warning has been added when all the genomes are filtered out and not classified.

Bug Fixes:

(#420 ) Fixed an issue where GTDB-Tk might hang when classifying TIGRFAM markers (identify, classify_wf, de_novo_wf). Special thanks to @lfenske-93 and @sjaenick for their contribution.
(#428) Fixed an issue where the --gtdbtk_classification_file would raise an error trying to read the classify summary (root, de_novo_wf).
(#439) Fix the pipeline when using protein files instead of nucleotide files. symlink uses absolute path instead.

Contributors

sjaenick, neoformit, and lfenske-93

Assets 2

11 Jul 05:11

pchaumeil

2.1.1

3656d71

2.1.1

(#399) Fix --genes options
(#400) Modify config.py file to resolve this issue
Updated documentation ( including #410 , documentation for itol)

Assets 2

12 May 03:23

pchaumeil

2.1.0

185cebc

2.1.0

Major changes:

GTDB-TK now uses a divide-and-conquer approach where the bacterial reference tree is split into multiple class-level subtrees. This reduces the memory requirements of GTDB-Tk from 320 GB of RAM when using the full GTDB R07-RS207 reference tree to approximately 55 GB. A manuscript describing this approach is in preparation. If you wish to continue using the full GTDB reference tree use the --full-tree flag. This is the main change from v2.0.0. The split tree approach has been modified from order-level trees to class-level trees to resolve specific classification issues (see #383).
Genomes that cannot be assigned to a domain (e.g. genomes with no bacterial or archaeal markers or genomes with no genes called by Prodigal) are now reported in the gtdbtk.bac120.summary.tsv as 'Unclassified'
Genomes filtered out during the alignment step are now reported in the gtdbtk.bac120.summary.tsv or gtdbtk.ar53.summary.tsv as 'Unclassified Bacteria/Archaea'
--write_single_copy_genes flag in now available in the classify_wf and de_novo_wf workflows.

Features:

(#392) --write_single_copy_genes flag available in workflows.
(#387) specific memory requirements set in classify_wf depending on the classification approach.

Important

This version is not backwards compatible with GTDB package R207 v1.
This version requires a new reference package

Assets 2

08 Apr 01:54

aaronmussig

2.0.0

7863333

2.0.0

Major changes:

GTDB-TK now uses a divide-and-conquer approach where the bacterial reference tree is split into multiple order-level subtrees. This reduces the memory requirements of GTDB-Tk from 320 GB of RAM when using the full GTDB R07-RS207 reference tree to approximately 35 GB. A manuscript describing this approach is in preparation. If you wish to continue using the full GTDB reference tree use the --full-tree flag.
Archaeal classification now uses a refined set of 53 archaeal-specific marker genes based on the recent publication by Dombrowski et al., 2020. This set of archaeal marker genes is now used by GTDB for curating the archaeal taxonomy.
By default, all directories containing intermediate results are now removed by default at the end of the classify_wf and de_novo_wf pipelines. If you wish to retain these intermediates files use the --keep-intermediates flag.
All MSA files produced by the align step are now compressed with gzip.
The classification summary and failed genomes files are now the only files linked in the root directory of classify_wf.

Features:

convert_to_itol to convert trees into iTOL format (#373)
Output FASTA files are compressed by default (#369)
Intermediate files will be removed by default when using classify/de-novo workflows unless specified by --keep_intermediates (#369)
Add --genes flag for Error (#362)
A warning will be displayed if pplacer fails to place a genome (#360 / #356)

Important

This version is not backwards compatible with GTDB release 202.
This version requires a new reference package

Assets 2

15 Oct 03:17

aaronmussig

1.7.0

eaccdd6

1.7.0

(#336) Warn the user if they have provided an incorrectly formatted taxonomy file.
(#348) Gracefully exit the program if no single copy hits could be identified.
(#351) Fixed an issue where GTDB-Tk would crash if spaces were present in the reference data path.
(#354) Added optional --tmpdir argument to set temporary directory (thanks @tr11-sanger ).

Contributors

tr11-sanger

Assets 2

20 Aug 00:20

aaronmussig

1.6.0

80a4801

1.6.0

(#337) Set minimum tqdm version to 4.35.0
(#335) Fixed typo in output log messages (@fplaza)
Removed the option to re-calculate RED values (–recalculate_red)

Contributors

fplazaonate

Assets 2

24 Jun 06:24

aaronmussig

1.5.1

d18e4c2

1.5.1

Changelog:

#327 Disallow spaces in genome names/file paths due to downstream application issues.
#326 Disallow genome names that are blank.

Assets 2

26 Apr 21:54

aaronmussig

1.5.0

f678ca6

1.5.0

Changes:

Updated to use PFAM 33.1 markers.
Updated to use GTDB R202 taxonomy (note, this will require an update to the reference package https://ecogenomics.github.io/GTDBTk/installing/index.html#gtdb-tk-reference-data)

Fixes:

Automatic drop of genome leads to error in downstream modules of classify_wf (#312)
--scratch_dir not working in v 1.4.1 (#311)

Assets 2

03 Feb 23:56

aaronmussig

1.4.1

94b855e

1.4.1

Updated GitHub CI/CD to trigger docker build / tag version on release.
(#255) (#297) Fixed 'Namespace' object has no attribute errors by adding default arguments to argparse.

Assets 2

30 Nov 23:05

aaronmussig

1.4.0

4707f2e

1.4.0

Check if stdout is being piped to a file before adding colour.
(#283) Significantly improved classify performance (noticeable when running trees > 1,000 taxa).
Automatically cap pplacer CPUs to 64 unless specifying --pplacer_cpus to prevent pplacer from hanging.
(#262) Added --write_single_copy_genes to the identify command. Writes unaligned single-copy AR122/BAC120 marker genes to disk.
When running -version warn if GTDB-Tk is not running the most up-to-date version (disable via GTDBTK_VER_CHECK = False in config.py). If GTDB-Tk encounters an error it will silently continue (3 second timeout).
(#276) Renamed the column aa_percent to msa_percent in summary.tsv (produced by classify).
(#286) Fixed a file not found error when the reference data is a symbolic link (thanks davidealbanese!).
(#277) Fixed an issue where if the user overrides the translation table using the optional 3rd column in the batchfile, the other coding density would appear as -100. Both translation table densities are now reported.
The check_install command now also checks that all third party binaries can be found on the system path.
The align step is now approximately 10x faster.
(#289) Added --min_af to classify and classify_wf which allows the user to specify the minimum alignment fraction for FastANI.
Added the --mash_db command to re-use the GTDB-Tk Mash reference database in ani_rep.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2.2.0

Contributors

Contributors

Contributors

Releases: Ecogenomics/GTDBTk

2.2.0

2.2.0

Contributors

2.1.1

2.1.0

2.0.0

1.7.0

Contributors

1.6.0

Contributors

1.5.1

1.5.0

1.4.1

1.4.0