Skip to content

Releases: PapenfussLab/gridss

1.4.1

06 Jun 03:50
Compare
Choose a tag to compare
  • Fixed crash when performing split read realignment on data with very long read names, reference contig names, or CIGAR strings. #82
    • The BAM file format has a 254 character limit on read names. A hash is now taken instead of just appending an alignment-unique identifier to the read. This ensures that read names in intermediate files do not exceed this limit.

1.4.0

17 May 07:49
Compare
Choose a tag to compare

Assembly performance, usability, and bug fix release

  • Improved assembly performance
    • Updated library fragment size distribution cache logic
      • Removed check for the existence of the assembly output file every time a read was assembled.
    • Performing full rememoization when computationally cheaper than updating the memoization data structures.
    • Trimming self-intersecting reads before adding to assembly graph instead of just trimming contig
    • Reduced default assembly maximum node density from 20 to 5
    • Reduced default maxCoverage from 20,000x to 10,000x
  • Using read group sample name instead of the filename in the output VCF if the input file contains only a single read group #71
  • Allowing unit of work used for parallel computation to span reference sequences
    • Reduces the number of intermediate files on reference genomes with many contigs (e.g. fragmented assemblies)
  • Fixed concordant read pairs sometimes being unnecessarily extracted into the .sv.bam intermediate file
  • Preventing gigantic log files by suppressing warning messages after they have been output 100 times
  • ExtractSVReads: removed log spam for single-end libraries #75
  • Removed unnecessary warning message about assembly containing unpaired reads
  • Adding additional assembling debug timing information outputs (disabled by default)
  • Removing lock file even when GRIDSS is terminated #78

1.3.4

27 Mar 01:09
Compare
Choose a tag to compare

Improved stability on reference genomes with thousands of contigs

  • Fixed file handle leak when calling variants
  • Attempt recovery after file handle exhaustion

1.3.3

24 Mar 04:57
Compare
Choose a tag to compare
  • Excluding discordant read pair extraction if no concordant read pair distribution can be calculated.
  • Fixed some contig boundry edge cases
  • Ignoring reads that align outside of contig bounds
  • Considering any kmer containing a base falling outside the contig bounds as unanchored
  • Not adjusting bounds of assemblies over-runing contig bounds if it would result in a negative CIGAR element length
  • Added configuration option to more aggressively release file handles
  • Supplementary records now use the primary alignment when determining whether to extract
  • Reducing intermediate file sizes by only extracting unmapped reads of interest (instead of all unmapped reads)
  • Using MC mate CIGAR SAM tag if available when estimating read fragment size
  • fragment size estimation now includes hard clipped bases in the base count
  • Ignoring split reads that fully align to one of the split alignment locations
  • improves handling split reads generated by bwa
  • SV metrics calculation no longer double-counting reads
  • ComputeSamTags: replaced CONVERT_SECONDARY_TO_SUPPLEMENTARY with RECALCULATE_SA_SUPPLEMENTARY parameter
  • Now handling multiple independent split read alignments for a single read.
  • Not setting supplementary tag on unmapped reads regardless of SA tag (prevents htsjdk validation failing)
  • Now passing all common command-line arguments to child programs
  • Deleting intermediate files immediately after the next step is complete
  • Reduces peak disk usage

1.3.2

02 Mar 05:06
Compare
Choose a tag to compare

Bug fix release

  • External aligner thread count now limited by WORKER_THREADS
  • Corrected AssemblyEvidenceSource.minConcordantFragmentSize() calculation
    • Fixes java.lang.AssertionError at au.edu.wehi.idsv.debruijn.positional.PathSimplificationIterator.advance()
  • Setting supplementary flag on split secondary alignments
    • Corrects split read scoring asymmetry for bams aligned with bwa mem -M
  • Corrected off-by-one error in split read bounds when split read alignments overlap
    • Affects inputs files aligned with bwa

1.3.1

27 Feb 23:58
Compare
Choose a tag to compare

Minor bug fix release

  • Updated assembly sort window size to include assembly alignment width
    • Fixes java.lang.IllegalStateException: Unable to sort output with window size of n error

1.3.0

25 Feb 15:31
Compare
Choose a tag to compare

Based on user feedback, the VCF output format has been updated.

  • Added link to GRIDSS preprint to README
  • VCF output now uses FORMAT fields improve downstream usability
  • INPUT_CATEGORY parameter removed
    • VCF FORMAT field header now defaults to input file name
    • INPUT_LABEL replaces INPUT_CATEGORY. No longer need to remember which file which category!
  • Updated default main class
    • java -jar gridss.jar now works again
  • Indel assemblies being treated as originating from the side matching the assembly direction
  • Separately tracking compound breakpoints
    • Added CAS INFO/FORMAT field
    • Fixes breakend asymmetry in VCF FILTER
  • Fixed edge case with split read isReference() asymmetry when one alignment is soft clipped on both sides
    • Fixes breakend asymmetry in SR field
  • If the primary alignment of a split read is below the mapq threshold, no supplementary alignments are considered
    • Fixes breakend asymmetry in SRQ field
  • Updated htsjdk/picard versions
    • BAM reading should now be faster when -Dsamjdk.use_async_io_read_samtools=true is passed to java
  • Added code to recovery from homology inconsistency
  • Added scripts/girdsssanitycheck.R script that sanity checks the GRIDSS output VCF makes senses

1.2.4

20 Feb 01:36
Compare
Choose a tag to compare
  • Including assembly margin into coverage annotator slide window size calculation
    • Fix for java.lang.IllegalArgumentException: Unable to rewind from position
  • Now cleaning up intermediate files by default
    • use java -Dgridss.keepTempFiles=true to keep intermediate files
  • Increased default chunk size from 1Mb to 10Mb
    • On a human-sized genome, this reduces the number of intermediate files from ~3000 to a more manageable ~300
  • No longer writing any processing tracking/visualisation files by default

1.2.3

13 Feb 02:45
Compare
Choose a tag to compare
  • Expanded assembly sorting window size by maximum imprecise assembly contig placement width
    • Fix for occasional java.lang.IllegalStateException: "Unable to sort output ..." when calling variants supported by imprecise break-end assembly contigs.

1.2.2

13 Feb 01:46
Compare
Choose a tag to compare
  • Minor assembly performance improvements
  • Multi-mapping read handling now must be explicitly enabled. See README for documentation
  • Added minimum supporting read configuration option to assist in calling at very low confidence levels (default: 2 reads).