Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when running gridss #82

Closed
berniechow opened this issue May 31, 2017 · 3 comments
Closed

Error when running gridss #82

berniechow opened this issue May 31, 2017 · 3 comments

Comments

@berniechow
Copy link

I'm attempting to run gridss on a simulated bam file generated by varsim so that I can assess gridss precision/sensitivity. I'm using the example script gridss.sh (modified with my input bam and reference genome). The same bam file runs fine with other SV software. I am getting an error when gridss runs CallVariants.

The error is below:

...
INFO	2017-05-30 13:08:01	ComputeSamTags	Processed    17,000,000 records.  Elapsed time: 00:04:24s.  Time for last 1,000,000:   15s.  Last read position: chr1:112,530,476
INFO	2017-05-30 13:08:16	ComputeSamTags	Processed    18,000,000 records.  Elapsed time: 00:04:39s.  Time for last 1,000,000:   14s.  Last read position: chr6:70,652,849
INFO	2017-05-30 13:08:30	ComputeSamTags	Processed    19,000,000 records.  Elapsed time: 00:04:54s.  Time for last 1,000,000:   14s.  Last read position: chr7:129,931,169
INFO	2017-05-30 13:08:47	ComputeSamTags	Processed    20,000,000 records.  Elapsed time: 00:05:10s.  Time for last 1,000,000:   16s.  Last read position: chrY:11,981,575
INFO	2017-05-30 13:09:02	ComputeSamTags	Processed    21,000,000 records.  Elapsed time: 00:05:25s.  Time for last 1,000,000:   15s.  Last read position: chr8:674,139
INFO	2017-05-30 13:09:18	ComputeSamTags	Processed    22,000,000 records.  Elapsed time: 00:05:42s.  Time for last 1,000,000:   16s.  Last read position: chr9:71,564,565
INFO	2017-05-30 13:09:34	ComputeSamTags	Processed    23,000,000 records.  Elapsed time: 00:05:58s.  Time for last 1,000,000:   15s.  Last read position: chr8:143,391,946
[Tue May 30 13:09:36 PDT 2017] gridss.ComputeSamTags done. Elapsed time: 6.00 minutes.
Runtime.totalMemory()=9535750144
INFO	2017-05-30 13:09:36	SAMEvidenceSource	Identifying split reads for /gne/research/workspace/bchow/software/local_builds/varsim/out_hg38/varsim_sorted_hg38.bam
[Tue May 30 13:09:36 PDT 2017] gridss.SoftClipsToSplitReads INPUT=/gnet/is7/workspace/bchow/projects/sv_test/gridss/varsim/./varsim_sorted_hg38.bam.gridss.working/gridss.tmp.tagged.varsim_sorted_hg38.bam.sv.bam OUTPUT=/gnet/is7/workspace/bchow/projects/sv_test/gridss/varsim/./varsim_sorted_hg38.bam.gridss.working/gridss.tmp.splitreads.varsim_sorted_hg38.bam.sv.bam WORKER_THREADS=32 TMP_DIR=[/gnet/is7/workspace/bchow/projects/sv_test/gridss/varsim/.] VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=1 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=true CREATE_MD5_FILE=false REFERENCE_SEQUENCE=/gne/research/workspace/bchow/genomes/GRCh38/GRCh38.fa GA4GH_CLIENT_SECRETS=client_secrets.json    MIN_CLIP_LENGTH=15 MIN_CLIP_QUAL=5.0 PROCESS_SECONDARY_ALIGNMENTS=false ALIGNER_COMMAND_LINE=[bwa, mem, -t, %3$d, %2$s, %1$s] IGNORE_DUPLICATES=true
[Tue May 30 13:09:36 PDT 2017] Executing as [email protected] on Linux 2.6.32-504.3.3.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_74-b02; Picard version: 1.4.0
INFO	2017-05-30 13:10:50	ExternalProcessFastqAligner	Invoking external aligner
INFO	2017-05-30 13:10:50	ExternalProcessFastqAligner	bwa mem -t 32 /gne/research/workspace/bchow/genomes/GRCh38/GRCh38.fa /gnet/is7/workspace/bchow/projects/sv_test/gridss/varsim/./varsim_sorted_hg38.bam.gridss.working/varsim_sorted_hg38.bam.realign.0.fq
[M::bwa_idx_load_from_disk] read 261 ALT contigs
[M::process] read 4021570 sequences (244527969 bp)...
[M::mem_process_seqs] Processed 4021570 reads in 1477.155 CPU sec, 49.987 real sec
[Tue May 30 13:12:17 PDT 2017] gridss.SoftClipsToSplitReads done. Elapsed time: 2.68 minutes.
Runtime.totalMemory()=9715056640
ERROR	2017-05-30 13:12:17	CallVariants	Fatal exception thrown by worker thread.
java.lang.IllegalArgumentException: Value (285) to large to be written as ubyte.
	at htsjdk.samtools.util.BinaryCodec.writeUByte(BinaryCodec.java:317)
	at htsjdk.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:125)
	at htsjdk.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:134)
	at htsjdk.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:187)
	at htsjdk.samtools.AsyncSAMFileWriter.synchronouslyWrite(AsyncSAMFileWriter.java:36)
	at htsjdk.samtools.AsyncSAMFileWriter.synchronouslyWrite(AsyncSAMFileWriter.java:16)
	at htsjdk.samtools.util.AbstractAsyncWriter$WriterRunnable.run(AbstractAsyncWriter.java:122)
	at java.lang.Thread.run(Thread.java:745)
@d-cameron
Copy link
Member

d-cameron commented Jun 5, 2017

The BAM file format requires read names to be less than 254 characters (lowest 8 bytes of the bit-packed bin_mq_nl field). When identifying split reads from soft clipped alignments, GRIDSS appends unique alignment identifier containing the segment index, reference contig name, read start position, orientation (+ or -), CIGAR, and offset of this alignment with respect to the read template. Appending this information to your read name appears to have exceeded this read name limit.

A short-term fix is the generate your varsim reads with shorter names, I'll work on update to GRIDSS that hashes this information so the read length is bounded regardless of the input data.

d-cameron added a commit that referenced this issue Jun 6, 2017
…not exceed the 254 character BAM limit

Hashing of split read realignment fastq read names is always done
Hashing of evidenceIDs is controlled by the hashEvidenceID configuration option. Default of true reduces file sizes (especially important for long reads as the unhashed value includes the full CIGAR), at the cost of losing human readability of which reads contributed to which assembly.
@d-cameron
Copy link
Member

https://github.com/PapenfussLab/gridss/releases/tag/v1.4.1 should have your issue fixed.

@berniechow
Copy link
Author

Thank you for fixing! I just sucessfully re-ran gridss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants