-
Notifications
You must be signed in to change notification settings - Fork 22
Somatic variation fails because of inability to find GATK #47
Comments
Genome/Sys.pm tries to find GATK version 2.4 using the environment variable $ENV{GENOME_JAR_PATH},
This variable is set to /usr/share/java on the standalone and within TGI. Inside the TGI, the /usr/share/java has a symlink GenomeAnalysisTK.jar -> GenomeAnalysisTK-2.4.jar and the jar file GenomeAnalysisTK-2.4.jar Looks like this directory has to be replicated on the standalone install, is it just a matter of pointing the environment variable to a different folder where these jars exist already or does the whole directory need to be replicated ? There are quite a few things that are present in this directory on the TGI end but seem to be missing on the standalone install (picardtools, weka etc). |
Note from Scott on this issue: Most of those envs that are set to a global network path at TGI are set to something inside the sw directory for the standalone box. There should be some tgz with some Java stuff next-to the apps*.tgz. The java tgz may not have the latest stuff. If so there needs to be a new java tgz made, with he added stuff, with a different date, and the makefile should be updated to download it too . Some of the java stuff has been packaged as debs. I think Allison did this for gatk. If it is, a fresh genome-snapshot-deps package should solve the problem. The best people to talk with about the state of that are Matt Callaway and Nathan Nutter. Matt was trying to take it past the state in which I left it. There were directories for Ubuntu Lucid and Precise, and the precise directory should have equivalents of the same packages. Where this was not possible, the files listing those deps are broken out into a *.missing file. |
Currently the only things in 'java-2013-08-27.tgz' are: rdp-classifier_, samtools_, VarScan*, and weka.jar The only environment variable related to JAVA that is defined globally is as far as I can tell is: GENOME_JAR_PATH appears to be specified in: This does seem to work in the standalone GMS: /usr/share/java Even if we add GATK to the JAVA archive, it is not obvious to me from the makefile how the contents are meant to be found... If the system looks for them in '/usr/share/java' ... I see no active attempt to place them there. Perhaps that is only where properly packaged JAVA stuff goes? |
To see what is currently in genome-snapshot-deps within TGI, go to the top level of a 'genome' checkout and run: % cd /gscuser/mgriffit/git/genome/ All I see is this: Debian packages get into the standalone GMS something like this:
|
It looks like 'libgatk-protected-java' is marked as 'missing' in the precise (ubuntu 12.04) version of genome-snapshot-deps but as 'depends' in the lucid (Ubuntu 10.04) version. One thing we could try is to attempt to install it directly in the GMS, if it works, then we could take it out of the missing list for precise, put it in the regular list, and rebuild the apt repo for the standalone box. Another option is to use one of the 8 versions of GATK that are currently in the 'apps' repo from /gsc/pkg/bio in the TGI. These seem to be very old versions of GATK though. And the one expected in the test analysis is GATK version 2.4. IT seems that Genome::Sys is expecting this tool to be installed as a package in a standard way so that the version can be resolved as well. |
One thing about that package is that it should not actually be distributed outside of TGI. It contains a patch that disables the "phone home" behavior. Even though the modification is in the "public" source tree, the package includes code from the "protected" source tree, which is under a license that does not allow redistribution. Our GATK wrappers depend on the phone home behavior being disabled because they always insert the "-et NO_ET" argument into the command line. If you try this against a jar file without our patch it will give an error because they require an additional argument with a key provided by the Broad to allow the phone home override. Our patch skips this check. |
Since we can not re-distribute GATK we will need to setup the standalone GMS in such a way that allows the user the manually install GATK after obtaining the appropriate permissions from the Broad.
|
This is how you get earlier versions of the GATK, "Get the package for the version you want from this page: https://github.com/broadgsa/gatk-protected/tags From your terminal/console, navigate to the directory containing the source code. There, you run the command:
This will do everything for you. The compiled binary will be in the newly-created dist directory." |
I'm seeing this same issue on a somatic-variation build. Looks like the gms-pub specific change was reverted in the recent merge/refactor in master. Compare genome/genome@1b8a40c and Figure out if master can be modified to use consistent paths. |
not sure where to bring this up on master, Issues seem to be disabled for that repo. cc'ing @nnutter for ideas. |
This was added to the SGMS branch here, genome/genome@b62e2b0 |
At closer look, it looks like master has an improved way of looking for JAR paths. We are still using the old method which we think is ok for now. |
The first test of somatic-variation hit an error after about 1 min. The top of the error log looks like this:
We need to determine how software versions are found in the part of DV2...
The text was updated successfully, but these errors were encountered: