Skip to content

4.4.0.0

Compare
Choose a tag to compare
@droazen droazen released this 16 Mar 19:08
· 174 commits to master since this release
2dbc025

Download release: gatk-4.4.0.0.zip
Docker image: https://hub.docker.com/r/broadinstitute/gatk/

Highlights of the 4.4.0.0 release:

  • We've moved to Java 17, the latest long-term support (LTS) Java release, for building and running GATK! Previously we required Java 8, which is now end-of-life.

    • Newer non-LTS Java releases such as Java 18 or Java 19 may work as well, but since they are untested by us we only officially support running with Java 17.
  • Significant enhancements to SelectVariants, including arguments to enable GVCF filtering support and to work with genotype fields more easily.

  • A new tool SVConcordance, that calculates SV genotype concordance between an "evaluation" VCF and a "truth" VCF

  • Bug fixes and enhancements to the support for the Ultima Genomics flow-based sequencing platform introduced in GATK 4.3.0.0

Full list of changes:

  • Flow-based Variant Calling

    • FlowFeatureMapper: added surrounding-median-quality-size feature (#8222)
    • Removed hardcoded limit on max homopolymer call (#8088)
    • Fixed bug in dynamic read disqualification (#8171)
    • Fixed a bug in the parsing of the T0 tag (#8185)
    • Updated flow-based calling Mutect2 parameters to make them consistent with the HaplotypeCaller parameters (#8186)
  • SelectVariants

    • Enabled GVCF type filtering support in SelectVariants (#7193)
      • Added an optional argument --ignore-non-ref-in-types to support correct handling of VariantContexts that contain a NON_REF allele. This is necessary because every variant in a GVCF file would otherwise be assigned the type MIXED, which makes it impossible to filter for e.g. SNPs.
      • Note that this only enables correct handling of GVCF input. The filtered output files are VCF (not GVCF) files, since reference blocks are not extended when a variant is filtered out.
    • SelectVariants: added new arguments for controlling genotype JEXL filtering (#8092)
      • -select-genotype: with this new genotype-specific JEXL argument, we support easily filtering by genotype fields with expressions like 'GQ > 0', where the behavior in the multi-sample case is 'GQ > 0' in at least one sample. It's still possible to manually access genotype fields using the old -select argument and expressions such as vc.getGenotype('NA12878').getGQ() > 0.
      • --apply-jexl-filters-first: This flag is provided to allow the user to do JEXL filtering before subsetting the format fields, in particular the case where the filtering is done on INFO fields only, which may improve speed when working with a large cohort VCF that contains genotypes for thousands of samples.
  • SV Calling

    • Added a new tool SVConcordance, that calculates SV genotype concordance between an "evaluation" VCF and a "truth" VCF (#7977)
    • Recognize MEI DELs with ALT format DEL:ME in SVAnnotate (#8125)
    • Don't sort rejected reads output from AnalyzeSaturationMutagenesis (#8053)
  • Notable Enhancements

    • GenotypeGVCFs: added an --keep-specific-combined-raw-annotation argument to keep specified raw annotations (#7996)
    • VariantAnnotator now warns instead of fails when the variant contains too many alleles (#8075)
    • Read filters now output total reads processed in addition to the number of reads filtered (#7947)
    • Added GenomicsDB arguments to the CreateSomaticPanelOfNormals tool (#6746)
    • Added a DeprecatedFeature annotation and a process for officially marking GATK tools as deprecated (#8100)
    • Prevent tool close() methods from hiding underlying errors (#7764)
  • Bug Fixes

    • Fixed issue causing VariantRecalibrator to sometimes fail if user provided duplicate -an options (#8227)
    • ReblockGVCF: remove A,R, and G length attributes when ReblockGVCF subsets an allele (#8209)
      • Previously if an input gVCF had allele length, reference length, or genotype length annotations in the FORMAT field, ReblockGVCF would not remove all of them at sites where an allele was dropped. This makes the output gVCF invalid since the annotation length no longer matches the length described in the header at those sites. Now we fix up F1R2, F2R1, and AF annotations and remove any other annotations that are not already handled that are defined as A, R, or G length in the header.
    • Fixed a gCNV bug that breaks the inference when only 2 intervals are provided (#8180)
    • Fixed NPE from unintialized logger in GenotypingEngine (#8159)
    • Fixed asynchronous Python exception propagation in StreamingPythonExecutor/CNNScoreVariants (#7402)
    • Fixed issue in ShiftFasta where the interval list output was never written (#8070)
    • Bugfix for the type of some output files in the somatic CNV WDL (#6735) (#8130)
    • MergeAnnotatedRegions now requires a reference as asserted in its documentation (#8067)
  • Miscellaneous Changes

    • Deprecated an untested VariantRecalibrator argument and an old ReblockGVCF argument that produced invalid GVCFs (#8140)
    • Removed old GnarlyGenotyper code with a diploid assumption to prepare for adding haploid support to GnarlyGenotyper (#8140)
    • ReblockGVCF: add error message for when tree-score-threshold is set but the TREE_SCORE annotation is not present (#8218)
    • TransferReadTags: allow empty unaligned bams as input (#8198)
    • Refactored JointVcfFiltering WDL and expanded tests. (#8074)
    • Updated the carrot github action workflow to the most recent version, which supports using #carrot_pr to trigger branch vs master comparison runs (#8084)
    • Replaced uses of File.createTempFile() with IOUtils.createTempFile() to ensure that temp files are deleted on shutdown (#6780)
    • Don't require python just to instantiate the CNNScoreVariants tool classes. (#8128)
    • Made several Funcotator methods and fields protected so it is easier to extend the tool (#8124) (#8166)
    • Test for presence of ack result message and simplify ProcessControllerAckResult API (#7816)
    • Fixed the path reported by the gatkbot when there are test failures (#8069)
    • Fixed incorrect boolean value in DirichletAlleleDepthAndFractionIntegrationTest (#7963)
    • Removed two ancient and unused HaplotypeCaller test files that are no longer needed (#7634)
    • Added scattered gCNV case WDL to dockstore file (#8217)
  • Documentation

    • Updated instructions for installing Java in the README (#8089)
    • Added documentation on OMP_NUM_THREADS and MKL_NUM_THREADS to GermlineCNVCaller and DetermineGermlineContigPloidy (#8223)
    • Improvements to PileupDetectionArgumentCollection documentation (#8050)
    • Fixed typo in documentation for VariantAnnotator (#8145)
  • Dependencies

    • Moved to Java 17, the latest LTS Java release, for building/running GATK (#8035)
    • Updated Gradle to 7.5.1 (#8098)
    • Updated the GATK base docker image to 3.0.0 (#8228)
    • Updated HTSJDK to 3.0.5 (#8035)
    • Updated Picard to 3.0.0 (#8035)
    • Updated Barclay to 5.0.0 (#8035)
    • Updated GenomicsDB to 1.4.4 (#7978)
    • Updated Spark to 3.3.1 (#8035)
    • Updated Hadoop to 3.3.1. (#8102)
    • Require commons-text 1.10.0 to fix a security vulnerability (#8071)