Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds arg to add site level filters to genotype level in ReblockGvcfs #8484

Merged
merged 1 commit into from
Aug 25, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@ public final class ReblockGVCF extends MultiVariantWalker {
public static final String ALLOW_MISSING_LONG_NAME = "allow-missing-hom-ref-data";
public static final String KEEP_SITE_FILTERS_LONG_NAME = "keep-site-filters";
public static final String KEEP_SITE_FILTERS_SHORT_NAME = "keep-filters";
public static final String ADD_FILTERS_TO_GENOTYPE = "add-site-filters-to-genotype";

@Argument(fullName = StandardArgumentDefinitions.OUTPUT_LONG_NAME, shortName = StandardArgumentDefinitions.OUTPUT_SHORT_NAME,
doc="File to which variants should be written")
Expand Down Expand Up @@ -175,6 +176,10 @@ public final class ReblockGVCF extends MultiVariantWalker {
@Argument(fullName=KEEP_SITE_FILTERS_LONG_NAME, shortName = KEEP_SITE_FILTERS_SHORT_NAME, doc="Keep site level filters for variants (not ref blocks).")
private boolean keepFilters = false;

@Advanced
@Argument(fullName= ADD_FILTERS_TO_GENOTYPE, doc="Add site level filters to genotype level. Site level filters removed by default, if they should be kept, use --" + KEEP_SITE_FILTERS_LONG_NAME)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we already have a test for KEEP_SITE_FILTERS alone?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, KEEP_SITE_FILTERS was put in earlier and has a test.

private boolean addFiltersToFormatField = false;

//TODO: this will be an argument when posteriors handling is fully implemented in AlleleSubsettingUtils
protected String posteriorsKey = null;

Expand Down Expand Up @@ -273,6 +278,10 @@ public void onTraversalStart() {
}
}

if (addFiltersToFormatField) {
headerLines.add(VCFStandardHeaderLines.getFormatLine(VCFConstants.GENOTYPE_FILTER_KEY));
}

//Allele length and Genotype length annotations need to be subset or removed if alleles are dropped so we need to parse the header for annotation count types
for(final VCFFormatHeaderLine formatHeaderLine : inputHeader.getFormatHeaderLines()) {
if(formatHeaderLine.getCountType().equals(VCFHeaderLineCount.A) ||
Expand Down Expand Up @@ -677,8 +686,13 @@ VariantContext cleanUpHighQualityVariant(final VariantContext variant) {
final Genotype updatedAllelesGenotype = updatedAllelesVC.getGenotype(0);

//remove any AD reads for the non-ref
final List<Genotype> genotypesArray = removeNonRefADs(updatedAllelesGenotype, updatedAllelesVC.getAlleleIndex(Allele.NON_REF_ALLELE));
builder.genotypes(genotypesArray);
final Genotype g = removeNonRefADs(updatedAllelesGenotype, updatedAllelesVC.getAlleleIndex(Allele.NON_REF_ALLELE));
if (addFiltersToFormatField) {
final List<String> filters = updatedAllelesVC.getFilters().stream().toList();
builder.genotypes(new GenotypeBuilder(g).filters(filters).make());
} else {
builder.genotypes(g);
}

composeUpdatedAnnotations(attrMap, doQualApprox, posteriorsKey, variant, annotationEngine, relevantIndices, updatedAllelesVC, annotationsToKeep);

Expand Down Expand Up @@ -882,9 +896,9 @@ private static void addQualAnnotations(final Map<String, Object> destination, fi
* Any AD counts for the non-ref allele get propagated to every new allele when GVCFs are merged, so zero them out
* @param g a genotype that may or may not contain AD
* @param nonRefInd allele index of the non-ref, -1 if missing
* @return an unmodifiable Genotype array that can be used by a GenotypeBuilder
* @return a Genotype with the non-ref AD zeroed out and new DP, or the original genotype if it doesn't have AD
*/
private List<Genotype> removeNonRefADs(final Genotype g, final int nonRefInd) {
private Genotype removeNonRefADs(final Genotype g, final int nonRefInd) {
if (g.hasAD() && nonRefInd != -1) {
final int[] ad = g.getAD();
if (ad.length >= nonRefInd && ad[nonRefInd] > 0) { //only initialize a builder if we have to
Expand All @@ -897,12 +911,12 @@ private List<Genotype> removeNonRefADs(final Genotype g, final int nonRefInd) {
} else {
gb.DP((int) MathUtils.sum(ad));
}
return Collections.singletonList(gb.make());
return gb.make();
} else {
return Collections.singletonList(g);
return g;
}
} else {
return Collections.singletonList(g);
return g;
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -578,6 +578,32 @@ public void testFilters() throws IOException {
Assert.assertEquals(filteredRefBlockVC.getGenotype(0).getDP(), 12); // Ref block is combination of filtered variant with depth 22 and filtered ref block with depth 1
}

@Test
public void testMovingFilters() throws IOException {
final File input = getTestFile("dragen.g.vcf");
final File output = createTempFile("reblockedgvcf", ".vcf");

final ArgumentsBuilder args = new ArgumentsBuilder();
args.addReference(new File(hg38Reference))
.add("V", input)
.add("L", "chr1")
.add(ReblockGVCF.ADD_FILTERS_TO_GENOTYPE, true)
.addOutput(output);
runCommandLine(args);

final VariantContext filteredVC = VariantContextTestUtils.readEntireVCFIntoMemory(output.getAbsolutePath()).getRight().get(3); // last site in the file
Assert.assertFalse(filteredVC.isFiltered());
Assert.assertTrue(filteredVC.getGenotype(0).isFiltered());

final VariantContext unfilteredVC = VariantContextTestUtils.readEntireVCFIntoMemory(output.getAbsolutePath()).getRight().get(1);
Assert.assertFalse(unfilteredVC.isFiltered());
Assert.assertFalse(unfilteredVC.getGenotype(0).isFiltered());

final VariantContext filteredRefBlockVC = VariantContextTestUtils.readEntireVCFIntoMemory(output.getAbsolutePath()).getRight().get(0);
Assert.assertFalse(filteredRefBlockVC.isFiltered()); // Ref block is unfiltered even though the input RefBlock and low qual variant were both filtered
Assert.assertFalse(filteredRefBlockVC.getGenotype(0).isFiltered()); // Ref block genotype is also unfiltered
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this what you want?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's a good question. I had assumed that it didn't make sense to have a ref block with a genotype level filter. The low quality variant here was dropped (incorporated into the Ref Block), and I was thinking that the GQ was enough information here that it would only be more confusing if we included the filter in the ref block.

I suppose the outcome of this is that if we kept the filter status in the ref block it would make it into the final VCF at any sites that overlap the entire ref block. So if one variant had a dragen hard filter applied, but then was incorporated into a RefBlock, then all of the 0/0 genotypes would be filtered across the entire ref block, not just the original filtered site. I think this would be more confusing than it's worth, but happy to hear if you think we want to propagate that filter through.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To follow up here: we spoke offline and the conclusion was that GQ0 basically acts as a filter for ref blocks so it's ok to not propagate the filter from the low quality variants.

}

@Test
public void testRemovingFormatAnnotations() {
final File input = getTestFile("dragen.g.vcf");
Expand Down