-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added --sites-only argument to GATK engine #4764
Conversation
@ldgauthier This is what you requested correct? |
@jamesemery Those of us that work with thousands of samples thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review complete, back to @jamesemery. Looks good, minor comments only.
@@ -113,6 +113,10 @@ | |||
optional = true) | |||
public boolean disableBamIndexCaching = false; | |||
|
|||
@Argument(fullName = StandardArgumentDefinitions.SITES_ONLY_LONG_NAME, | |||
doc = "If true, don't emit GT fields when writing vcf file output.", optional = true) | |||
public boolean sitesOnlyMode = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name of field should make it clear that this applies to VCFs specifically -- eg., outputSitesOnlyVCFs
instead of sitesOnlyMode
@@ -728,6 +732,10 @@ public VariantContextWriter createVCFWriter(final File outFile) { | |||
} | |||
} | |||
|
|||
if (sitesOnlyMode) { | |||
options.add(Options.DO_NOT_WRITE_GENOTYPES); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check to see whether there are any other GATKTools
that bypass GATKTool.createVCFWriter()
when writing a VCF and call GATKVariantContextUtils.createVCFWriter()
directly instead? (besides the ones you dealt with below)
Don't worry about non-GATKTools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have already checked. It is bypassed in a number of tests, the HaplotypeCaller, and CreateSomaticPanelOfNormals
which doesn't extend GATKTool
so I elected not to include it here.
Utils.nonNull(outputVCF); | ||
Utils.nonNull(readsDictionary); | ||
|
||
List<Options> options = new ArrayList<>(2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final
@@ -83,6 +92,25 @@ public void testFeaturesAsIntervalsUnrecognizedFormatFile() throws IOException { | |||
testSpec.executeTest("testFeaturesAsIntervals", this); | |||
} | |||
|
|||
@Test | |||
public void testSitesOnlyMode() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this to a new GATKToolIntegrationTest
class instead? I regret my original suggestion of FeatureSupportIntegrationTest
File out = createTempFile("GTStrippedOutput", "vcf"); | ||
String[] args = new String[] { | ||
"-V", FEATURE_INTEGRATION_TEST_DIRECTORY + "vcf_with_genotypes.vcf", | ||
"--" + StandardArgumentDefinitions.SITES_ONLY_LONG_NAME, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you confirm that this test fails if you remove the --sites-only
arg temporarily?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just checked, it fails without --sites-only
Assert.assertFalse(v.hasGenotypes()); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also add a similar test to HaplotypeCallerIntegrationTest
to prove that it honors --sites-only
, since that is a special case?
Codecov Report
@@ Coverage Diff @@
## master #4764 +/- ##
============================================
+ Coverage 80.07% 80.08% +0.01%
- Complexity 17420 17571 +151
============================================
Files 1080 1081 +1
Lines 63131 63815 +684
Branches 10200 10400 +200
============================================
+ Hits 50549 51103 +554
- Misses 8590 8702 +112
- Partials 3992 4010 +18
|
@droazen Responded to your comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second-pass review complete, back to @jamesemery
@@ -0,0 +1,37 @@ | |||
package org.broadinstitute.hellbender; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class should live in the same package as GATKTool
(org.broadinstitute.hellbender.engine
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -322,17 +322,20 @@ private void initializeActiveRegionEvaluationGenotyperEngine() { | |||
* @return a VCF or GVCF writer as appropriate, ready to use | |||
*/ | |||
public VariantContextWriter makeVCFWriter( final String outputVCF, final SAMSequenceDictionary readsDictionary, | |||
final boolean createOutputVariantIndex, final boolean createOutputVariantMD5 ) { | |||
final boolean createOutputVariantIndex, final boolean sitesOnlyMode, | |||
final boolean createOutputVariantMD5 ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be slightly more logical if the ordering of the boolean args were: createOutputVariantIndex, createOutputVariantMD5, sitesOnlyMode
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
"--" + StandardArgumentDefinitions.SITES_ONLY_LONG_NAME, | ||
"-O", | ||
out.getAbsolutePath()}; | ||
new Main().instanceMain(makeCommandLineArgs(Arrays.asList(args), SelectVariants.class.getSimpleName())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a new overload of runCommandLine()
to CommandLineProgramTest
that takes 2 arguments: a List of args, and an explicit tool name, and call that new overload here? Should also add it to the CommandLineProgramTester
interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
import org.broadinstitute.hellbender.CommandLineProgramTest; | ||
import org.broadinstitute.hellbender.Main; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove stray imports from this class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doen
@@ -71,6 +73,30 @@ public void testVCFModeIsConsistentWithPastResults(final String inputFileName, f | |||
IntegrationTestSpec.assertEqualTextFiles(output, expected); | |||
} | |||
|
|||
@Test | |||
public void testSitesOnlyMode() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this test after the standard testVCFMode*
and testGVCFMode*
tests, rather than sticking it in the middle of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also call Utils.resetRandomGenerator()
at the beginning of the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
final String[] args = { | ||
"-I", NA12878_20_21_WGS_bam, | ||
"-R", b37_reference_20_21, | ||
"-L", "20:10000000-10100000", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like an unnecessarily large interval for this test. How about 10k bases instead of 100k?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -37,6 +37,7 @@ private StandardArgumentDefinitions(){} | |||
public static final String ANNOTATIONS_TO_EXCLUDE_LONG_NAME = "annotations-to-exclude"; | |||
public static final String SAMPLE_NAME_LONG_NAME = "sample-name"; | |||
public static final String PEDIGREE_FILE_LONG_NAME = "pedigree"; | |||
public static final String SITES_ONLY_LONG_NAME = "sites-only"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rename to the more explicit sites-only-vcf-output
?
@droazen Responded to comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Will merge once tests pass
Fixes #4763