-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OOM for 1.5 Terabytes? #438
Comments
The log indicates you're using Homo_sapiens.GRCh37.dna.toplevel.fa but then
you also have contigs such as HG1462_PATCH in the reference. How big is
your reference file?
Runtime.totalMemory()=3621257216
The step in question (PrepareReference) uses a fixed 4Gb heap size. If you
have a large reference, you'll have to edit the gridss.sh script to
increase the max memory Xmx parameter on that step (and probably the other
fixed-memory steps as well).
it doesn't use the jvmheap parameter since it needs off-heap memory for bwa
indexing).
Also, your assembly parameter needs to be a file not directory (eg
--assembly ./assembly.bam)
…On Mon, 11 Jan 2021, 08:12 Matthew J. Oldach, ***@***.***> wrote:
Trying to run GRIDSS for the first time and getting *OOM* errors no
matter how much RAM I throw at it?
#!/bin/bash
#SBATCH --job-name=gridss_bigmem # Job name
#SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH ***@***.*** # Where to send mail
#SBATCH --ntasks=1 #Run on a single CPU
#SBATCH --cpus-per-task=8 # How many cores?
#SBATCH --mem-per-cpu=250G
#SBATCH --partition=bigmem
#SBATCH --output=nap_gridss_test_bigmem_%j.log # Standard output and error log
#SBATCH --error=nap_gridss_test_bigmem_%j.err # Error log
#SBATCH --time=12:00:00
pwd; hostname; date
/project/M-mtgraovac182840/tools/gridss.sh \
--reference /project/M-mtgraovac182840/indexes/GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa \
--output gridss.vcf.gz \
--assembly ./ \
--threads 1 \
--jar /project/M-mtgraovac182840/tools/gridss-2.10.2-gridss-jar-with-dependencies.jar \
--workingdir ./ \
--jvmheap 2000g \
--steps All \
--maxcoverage 50000 \
--labels proband \
proband_trim_bwaMEM_sort_dedupped.bam
date
Sun Jan 10 12:08:16 MST 2021: Full log file is: ./gridss.full.20210110_120816.sm1.286547.log
Sun Jan 10 12:08:16 MST 2021: Found /usr/bin/time
Sun Jan 10 12:08:16 MST 2021: Using GRIDSS jar /project/M-mtgraovac182840/tools/gridss-2.10.2-gridss-jar-with-dependencies.jar
Sun Jan 10 12:08:16 MST 2021: Using reference genome "/project/M-mtgraovac182840/indexes/GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa"
Sun Jan 10 12:08:16 MST 2021: Using assembly bam ./
Sun Jan 10 12:08:16 MST 2021: Using output VCF gridss.vcf.gz
Sun Jan 10 12:08:16 MST 2021: Using 1 worker threads.
Sun Jan 10 12:08:16 MST 2021: Using no blacklist bed. The encode DAC blacklist is recommended for hg19.
Sun Jan 10 12:08:16 MST 2021: Using JVM maximum heap size of 2000g for assembly and variant calling.
Sun Jan 10 12:08:16 MST 2021: Using input file proband_trim_bwaMEM_sort_dedupped.bam
Sun Jan 10 12:08:16 MST 2021: label is proband
Sun Jan 10 12:08:16 MST 2021: Found /project/M-mtgraovac182840/tools/R-4.0.3/bin/Rscript
Sun Jan 10 12:08:16 MST 2021: Found /project/M-mtgraovac182840/tools/samtools-1.3.1/bin/samtools
Sun Jan 10 12:08:16 MST 2021: Found /usr/bin/java
Sun Jan 10 12:08:16 MST 2021: Found /project/M-mtgraovac182840/tools/bwa
Sun Jan 10 12:08:16 MST 2021: samtools version: 1.3.1+htslib-1.3.1
Sun Jan 10 12:08:16 MST 2021: R version: R scripting front-end version 4.0.3 (2020-10-10)
Sun Jan 10 12:08:16 MST 2021: bwa Version: 0.7.15-r1140
Sun Jan 10 12:08:16 MST 2021: time version: time (GNU Time) UNKNOWN
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by David Keppel, David MacKenzie, and Assaf Gordon.
Sun Jan 10 12:08:16 MST 2021: bash version: GNU bash, version 4.4.19(1)-release (x86_64-redhat-linux-gnu)
Sun Jan 10 12:08:16 MST 2021: java version: openjdk version "1.8.0_272" OpenJDK Runtime Environment (build 1.8.0_272-b10) OpenJDK 64-Bit Server VM (build 25.272-b10, mixed mode)
Sun Jan 10 12:08:18 MST 2021: Max file handles: 131072
Sun Jan 10 12:08:18 MST 2021: Running GRIDSS steps: setupreference, preprocess, assemble, call,
Sun Jan 10 12:08:18 MST 2021: Running PrepareReference (once-off setup for reference genome)
INFO 2021-01-10 12:08:18 Defaults Found file for property samjdk.reference_fasta: /project/M-mtgraovac182840/indexes/GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa
INFO 2021-01-10 12:08:18 PrepareReference
********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
********** PrepareReference -REFERENCE_SEQUENCE /project/M-mtgraovac182840/indexes/GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa
**********
12:08:18.987 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/project/M-mtgraovac182840/tools/gridss-2.10.2-gridss-jar-with-dependencies.jar!/com/intel/gkl/native/libgkl_compression.so
[Sun Jan 10 12:08:19 MST 2021] PrepareReference REFERENCE_SEQUENCE=/project/M-mtgraovac182840/indexes/GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa CREATE_SEQUENCE_DICTIONARY=true CREATE_GRIDSS_REFERENCE_CACHE=true CREATE_BWA_INDEX_IMAGE=true VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Sun Jan 10 12:08:19 MST 2021] Executing as ***@***.*** on Linux 4.18.0-193.28.1.el8_2.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_272-b10; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.10.2-gridss
INFO 2021-01-10 12:08:19 PrepareReference Sequence dictionary found.
INFO 2021-01-10 12:08:19 PrepareReference Creating GRIDSS reference cache file /project/M-mtgraovac182840/indexes/GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa.gridsscache
INFO 2021-01-10 12:08:19 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG1287_PATCH
INFO 2021-01-10 12:08:22 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG1459_PATCH
INFO 2021-01-10 12:08:24 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HSCHR6_MHC_SSTO
INFO 2021-01-10 12:08:25 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HSCHR6_MHC_MCF
INFO 2021-01-10 12:08:28 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HSCHR6_MHC_COX
INFO 2021-01-10 12:08:30 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HSCHR6_MHC_MANN
INFO 2021-01-10 12:08:32 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HSCHR6_MHC_APD
INFO 2021-01-10 12:08:34 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HSCHR6_MHC_QBL
INFO 2021-01-10 12:08:36 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HSCHR6_MHC_DBB
INFO 2021-01-10 12:08:38 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG1433_PATCH
INFO 2021-01-10 12:08:40 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG1257_PATCH
INFO 2021-01-10 12:08:47 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HSCHR17_1
INFO 2021-01-10 12:08:48 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG1292_PATCH
INFO 2021-01-10 12:08:51 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HSCHR5_1_CTG1
INFO 2021-01-10 12:08:53 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG1592_PATCH
INFO 2021-01-10 12:08:55 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG1425_PATCH
INFO 2021-01-10 12:08:56 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG1462_PATCH
INFO 2021-01-10 12:08:58 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG7_PATCH
INFO 2021-01-10 12:09:01 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG1458_PATCH
INFO 2021-01-10 12:09:03 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HSCHR19LRC_LRC_J_CTG1
INFO 2021-01-10 12:09:03 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HSCHR19LRC_LRC_S_CTG1
INFO 2021-01-10 12:09:04 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HSCHR19LRC_LRC_I_CTG1
INFO 2021-01-10 12:09:05 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG1079_PATCH
INFO 2021-01-10 12:09:05 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG1443_HG1444_PATCH
INFO 2021-01-10 12:09:07 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG979_PATCH
INFO 2021-01-10 12:09:08 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HSCHR19LRC_LRC_T_CTG1
INFO 2021-01-10 12:09:09 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HSCHR19LRC_COX1_CTG1
INFO 2021-01-10 12:09:10 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HSCHR19LRC_PGF1_CTG1
INFO 2021-01-10 12:09:11 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG1453_PATCH
INFO 2021-01-10 12:09:13 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG1423_PATCH
INFO 2021-01-10 12:09:15 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG1437_PATCH
INFO 2021-01-10 12:09:16 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG865_PATCH
INFO 2021-01-10 12:09:18 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG1434_PATCH
INFO 2021-01-10 12:09:19 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG1438_PATCH
INFO 2021-01-10 12:09:21 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HSCHR19LRC_PGF2_CTG1
INFO 2021-01-10 12:09:22 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG1293_PATCH
INFO 2021-01-10 12:09:25 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG1426_PATCH
INFO 2021-01-10 12:09:27 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HSCHR19LRC_COX2_CTG1
INFO 2021-01-10 12:09:28 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG1308_PATCH
INFO 2021-01-10 12:09:29 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG1463_PATCH
INFO 2021-01-10 12:09:31 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG987_PATCH
INFO 2021-01-10 12:09:32 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG19_PATCH
INFO 2021-01-10 12:09:33 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG953_PATCH
INFO 2021-01-10 12:09:36 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HSCHR4_1
INFO 2021-01-10 12:09:38 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG730_PATCH
INFO 2021-01-10 12:09:39 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG1350_HG959_PATCH
INFO 2021-01-10 12:09:40 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG306_PATCH
INFO 2021-01-10 12:09:41 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG1082_HG167_PATCH
INFO 2021-01-10 12:09:43 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG75_PATCH
INFO 2021-01-10 12:09:44 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG1424_PATCH
INFO 2021-01-10 12:09:46 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG174_HG254_PATCH
INFO 2021-01-10 12:09:49 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG357_PATCH
INFO 2021-01-10 12:09:51 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG29_PATCH
INFO 2021-01-10 12:09:53 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG256_PATCH
INFO 2021-01-10 12:09:54 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG183_PATCH
INFO 2021-01-10 12:09:55 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG1146_PATCH
INFO 2021-01-10 12:09:56 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG1441_PATCH
INFO 2021-01-10 12:09:58 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG871_PATCH
INFO 2021-01-10 12:10:00 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG104_HG975_PATCH
INFO 2021-01-10 12:10:02 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG1442_PATCH
INFO 2021-01-10 12:10:04 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG185_PATCH
INFO 2021-01-10 12:10:05 TwoBitBufferedReferenceSequenceFile Caching reference genome contig HG122_PATCH
[Sun Jan 10 12:10:05 MST 2021] gridss.PrepareReference done. Elapsed time: 1.78 minutes.
Runtime.totalMemory()=3621257216
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at htsjdk.samtools.reference.AbstractIndexedFastaSequenceFile.getSubsequenceAt(AbstractIndexedFastaSequenceFile.java:184)
at htsjdk.samtools.reference.IndexedFastaSequenceFile.getSubsequenceAt(IndexedFastaSequenceFile.java:49)
at htsjdk.samtools.reference.AbstractIndexedFastaSequenceFile.getSequence(AbstractIndexedFastaSequenceFile.java:162)
at htsjdk.samtools.reference.IndexedFastaSequenceFile.getSequence(IndexedFastaSequenceFile.java:49)
at au.edu.wehi.idsv.picard.TwoBitBufferedReferenceSequenceFile.cacheLoad(TwoBitBufferedReferenceSequenceFile.java:172)
at au.edu.wehi.idsv.picard.TwoBitBufferedReferenceSequenceFile.lambda$save$1(TwoBitBufferedReferenceSequenceFile.java:73)
at au.edu.wehi.idsv.picard.TwoBitBufferedReferenceSequenceFile$$Lambda$36/1896294051.accept(Unknown Source)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
at au.edu.wehi.idsv.picard.TwoBitBufferedReferenceSequenceFile.save(TwoBitBufferedReferenceSequenceFile.java:73)
at gridss.PrepareReference.doWork(PrepareReference.java:48)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
at gridss.PrepareReference.main(PrepareReference.java:89)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#438>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABOBYOF3BVZIVO7IXCTAYL3SZIJ5LANCNFSM4V4XEKOA>
.
|
Would an additional option to override the fixed heap size commands be useful? |
Hi @d-cameron, sorry for the slow response. The reference genome (Ensembl's TopLevel)
Thanks!
I was able to run successfully by with making hard changes to the |
Trying to run GRIDSS for the first time and getting OOM errors no matter how much RAM I throw at it?
The text was updated successfully, but these errors were encountered: