Releases: PapenfussLab/gridss
Releases · PapenfussLab/gridss
1.4.1
- Fixed crash when performing split read realignment on data with very long read names, reference contig names, or CIGAR strings. #82
- The BAM file format has a 254 character limit on read names. A hash is now taken instead of just appending an alignment-unique identifier to the read. This ensures that read names in intermediate files do not exceed this limit.
1.4.0
Assembly performance, usability, and bug fix release
- Improved assembly performance
- Updated library fragment size distribution cache logic
- Removed check for the existence of the assembly output file every time a read was assembled.
- Performing full rememoization when computationally cheaper than updating the memoization data structures.
- Trimming self-intersecting reads before adding to assembly graph instead of just trimming contig
- Reduced default assembly maximum node density from 20 to 5
- Reduced default maxCoverage from 20,000x to 10,000x
- Updated library fragment size distribution cache logic
- Using read group sample name instead of the filename in the output VCF if the input file contains only a single read group #71
- Allowing unit of work used for parallel computation to span reference sequences
- Reduces the number of intermediate files on reference genomes with many contigs (e.g. fragmented assemblies)
- Fixed concordant read pairs sometimes being unnecessarily extracted into the .sv.bam intermediate file
- Preventing gigantic log files by suppressing warning messages after they have been output 100 times
- ExtractSVReads: removed log spam for single-end libraries #75
- Removed unnecessary warning message about assembly containing unpaired reads
- Adding additional assembling debug timing information outputs (disabled by default)
- Removing lock file even when GRIDSS is terminated #78
1.3.4
1.3.3
- Excluding discordant read pair extraction if no concordant read pair distribution can be calculated.
- Fixed some contig boundry edge cases
- Ignoring reads that align outside of contig bounds
- Considering any kmer containing a base falling outside the contig bounds as unanchored
- Not adjusting bounds of assemblies over-runing contig bounds if it would result in a negative CIGAR element length
- Added configuration option to more aggressively release file handles
- Supplementary records now use the primary alignment when determining whether to extract
- Reducing intermediate file sizes by only extracting unmapped reads of interest (instead of all unmapped reads)
- Using MC mate CIGAR SAM tag if available when estimating read fragment size
- fragment size estimation now includes hard clipped bases in the base count
- Ignoring split reads that fully align to one of the split alignment locations
- improves handling split reads generated by bwa
- SV metrics calculation no longer double-counting reads
- ComputeSamTags: replaced CONVERT_SECONDARY_TO_SUPPLEMENTARY with RECALCULATE_SA_SUPPLEMENTARY parameter
- Now handling multiple independent split read alignments for a single read.
- Not setting supplementary tag on unmapped reads regardless of SA tag (prevents htsjdk validation failing)
- Now passing all common command-line arguments to child programs
- Deleting intermediate files immediately after the next step is complete
- Reduces peak disk usage
1.3.2
Bug fix release
- External aligner thread count now limited by WORKER_THREADS
- Corrected AssemblyEvidenceSource.minConcordantFragmentSize() calculation
- Fixes
java.lang.AssertionError at au.edu.wehi.idsv.debruijn.positional.PathSimplificationIterator.advance()
- Fixes
- Setting supplementary flag on split secondary alignments
- Corrects split read scoring asymmetry for bams aligned with
bwa mem -M
- Corrects split read scoring asymmetry for bams aligned with
- Corrected off-by-one error in split read bounds when split read alignments overlap
- Affects inputs files aligned with bwa
1.3.1
1.3.0
Based on user feedback, the VCF output format has been updated.
- Added link to GRIDSS preprint to README
- VCF output now uses FORMAT fields improve downstream usability
- INPUT_CATEGORY parameter removed
- VCF FORMAT field header now defaults to input file name
- INPUT_LABEL replaces INPUT_CATEGORY. No longer need to remember which file which category!
- Updated default main class
- java -jar gridss.jar now works again
- Indel assemblies being treated as originating from the side matching the assembly direction
- Separately tracking compound breakpoints
- Added CAS INFO/FORMAT field
- Fixes breakend asymmetry in VCF FILTER
- Fixed edge case with split read isReference() asymmetry when one alignment is soft clipped on both sides
- Fixes breakend asymmetry in SR field
- If the primary alignment of a split read is below the mapq threshold, no supplementary alignments are considered
- Fixes breakend asymmetry in SRQ field
- Updated htsjdk/picard versions
- BAM reading should now be faster when
-Dsamjdk.use_async_io_read_samtools=true
is passed to java
- BAM reading should now be faster when
- Added code to recovery from homology inconsistency
- Added scripts/girdsssanitycheck.R script that sanity checks the GRIDSS output VCF makes senses
1.2.4
- Including assembly margin into coverage annotator slide window size calculation
- Fix for
java.lang.IllegalArgumentException: Unable to rewind from position
- Fix for
- Now cleaning up intermediate files by default
- use java -Dgridss.keepTempFiles=true to keep intermediate files
- Increased default chunk size from 1Mb to 10Mb
- On a human-sized genome, this reduces the number of intermediate files from ~3000 to a more manageable ~300
- No longer writing any processing tracking/visualisation files by default