-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bg pileup detection #7432
Bg pileup detection #7432
Conversation
@jamesemery as promised here is the draft PR of the pileup code that works for M2 and HC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have given this code a first pass. Overall it looks good but there are some thing that need to be cleaned up/leftover comments/little things to deal with. I think my biggest comments are that there are a few methods that really need some unit tests about their behavior and I would be happy to talk over how to go about writing those. Given that there is a lot of interest in this code i think it will end up paying off investing the time early to write tests for some of the tricky methods you added (namely the filtering of the discovered haplotypes and some of the code you are using to add haplotypes).
I also think there should be more parameters exposed for this code in case other projects in the future want to try custom configurations of the pileup detection that might be more appropriate for their use case.
src/main/java/META-INF/MANIFEST.MF
Outdated
@@ -0,0 +1,3 @@ | |||
Manifest-Version: 1.0 | |||
Main-Class: org.broadinstitute.hellbender.Main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like an accidental addition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will remove
// @Test | ||
// public void testAddGivenAlleles() { | ||
// final int assemblyRegionStart = 1; | ||
// final int maxMnpDistance = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reenable these tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -114,6 +123,13 @@ public AssemblyRegion(final SimpleInterval activeSpan, final int padding, final | |||
this(activeSpan, true, padding, header); | |||
} | |||
|
|||
public List<AlignmentData> getAlignmentData() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment these
addToReadCollection(read, reads); | ||
} | ||
|
||
private void addToReadCollection(final GATKRead read, final List<GATKRead> collection) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"addToReadCollection" -> "addToReadCollectionAndValidate"
@@ -113,12 +117,21 @@ public AssemblyRegion next() { | |||
return toReturn; | |||
} | |||
|
|||
// private void addAlignmentData(AlignmentData alignmentData){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commented code
} | ||
|
||
// now look for indels | ||
if (element.isBeforeInsertion()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be an argument in the read threading assembler to enable/disable the indel checks at this stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jamesemery unclear what you mean here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some other projects that are interested in this code. They are not necessarily interested in the pileup indel calling code right now (since its complicated and they might have their own versions of this code) Ideally there should be a toggle that disables all of the indel pileup code. (or that disables putting them into the haplotypes).
// TODO: AH & BG add an option to deal with multiple alt alleles | ||
Optional<Map.Entry<Byte, Integer>> maxAlt = altCounts.entrySet().stream().max(Comparator.comparingInt(Map.Entry::getValue)); | ||
if (maxAlt.isPresent() && ((float)maxAlt.get().getValue() / (float)numOfBases) > 0.10 && numOfBases >= 5 ) { | ||
// if (maxAlt.isPresent() && ((float)maxAlt.get().getValue() / (float)numOfBases) > 0.1 && maxAlt.get().getValue() > 10 ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more commented code.
alleles.add(Allele.create(referenceContext.getBase(), true)); | ||
// TODO: AH & BG add an option to deal with multiple alt alleles | ||
Optional<Map.Entry<Byte, Integer>> maxAlt = altCounts.entrySet().stream().max(Comparator.comparingInt(Map.Entry::getValue)); | ||
if (maxAlt.isPresent() && ((float)maxAlt.get().getValue() / (float)numOfBases) > 0.10 && numOfBases >= 5 ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would pull this out as an explicit method filterAlleles() or something along those lines. I would also make static variables for these magic numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think these should ulitmately be exposed for configuration down the road... There are a few ways to think about doing it but we should probably discuss how that works here.
indelAlleles.add(Allele.create(referenceContext.getBase(), true)); | ||
Optional<Map.Entry<String, Integer>> maxIns = insertionCounts.entrySet().stream().max(Comparator.comparingInt(Map.Entry::getValue)); | ||
if (maxIns.isPresent() && ((float)maxIns.get().getValue() / (float)numOfBases) > 0.50 && numOfBases >= 5 ) { | ||
indelAlleles.add(Allele.create(String.format("%c", referenceContext.getBase()) + maxIns.get().getKey())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drop the String.Format() call here since this might get triggered a lot.
src/main/java/org/broadinstitute/hellbender/engine/AssemblyRegionIterator.java
Show resolved
Hide resolved
@@ -86,6 +88,8 @@ public AssemblyRegionIterator(final MultiIntervalShard<GATKRead> readShard, | |||
this.readCachingIterator = new ReadCachingIterator(readShard.iterator()); | |||
this.readCache = new ArrayDeque<>(); | |||
this.activityProfile = new BandPassActivityProfile(assemblyRegionArgs.maxProbPropagationDistance, assemblyRegionArgs.activeProbThreshold, BandPassActivityProfile.MAX_FILTER_SIZE, BandPassActivityProfile.DEFAULT_SIGMA, readHeader); | |||
// TODO: AH & BG handle changing contig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahaessly can you elaborate what was missing WRT this TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a simple test case that would cover the circumstance you were thinking of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was something that james warned us about. I think the shards can be from different contigs so I think we needed to add code that checked for that case.
58e59ea
to
23f29c8
Compare
src/main/java/org/broadinstitute/hellbender/utils/haplotype/Haplotype.java
Outdated
Show resolved
Hide resolved
...va/org/broadinstitute/hellbender/tools/walkers/haplotypecaller/AssemblyBasedCallerUtils.java
Show resolved
Hide resolved
...oadinstitute/hellbender/tools/walkers/haplotypecaller/PileupDetectionArgumentCollection.java
Outdated
Show resolved
Hide resolved
...oadinstitute/hellbender/tools/walkers/haplotypecaller/PileupDetectionArgumentCollection.java
Show resolved
Hide resolved
...oadinstitute/hellbender/tools/walkers/haplotypecaller/PileupDetectionArgumentCollection.java
Outdated
Show resolved
Hide resolved
* Percentage of reads required to supprot the alt for a variant to be considered | ||
*/ | ||
@Advanced | ||
@Argument(fullName= PILEUP_DETECTION_SNP_THRESHOLD, doc = "Percentage of alt supporting reads in order to consider alt SNP", optional = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename argument PILEUP_SNP_DETECTION_RATIO (also below for INDEL) also, in doc
make it clear that this is regarding pileup mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to note in the doc=
of all these arguments that it pertains to the PILEUP detection only. otherwise the only hint to that it in the name of the argument itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed all of the arguments to have a "Pileup Detection: " argument. Hopefully that will make it a little bit clearer? I've hidden these arguments anyway so maybe its all moot...
...oadinstitute/hellbender/tools/walkers/haplotypecaller/PileupDetectionArgumentCollection.java
Outdated
Show resolved
Hide resolved
...oadinstitute/hellbender/tools/walkers/haplotypecaller/PileupDetectionArgumentCollection.java
Outdated
Show resolved
Hide resolved
...oadinstitute/hellbender/tools/walkers/haplotypecaller/PileupDetectionArgumentCollection.java
Outdated
Show resolved
Hide resolved
...oadinstitute/hellbender/tools/walkers/haplotypecaller/PileupDetectionArgumentCollection.java
Outdated
Show resolved
Hide resolved
...oadinstitute/hellbender/tools/walkers/haplotypecaller/PileupDetectionArgumentCollection.java
Outdated
Show resolved
Hide resolved
...oadinstitute/hellbender/tools/walkers/haplotypecaller/PileupDetectionArgumentCollection.java
Outdated
Show resolved
Hide resolved
...oadinstitute/hellbender/tools/walkers/haplotypecaller/PileupDetectionArgumentCollection.java
Outdated
Show resolved
Hide resolved
...oadinstitute/hellbender/tools/walkers/haplotypecaller/PileupDetectionArgumentCollection.java
Outdated
Show resolved
Hide resolved
...oadinstitute/hellbender/tools/walkers/haplotypecaller/PileupDetectionArgumentCollection.java
Outdated
Show resolved
Hide resolved
* that are visible in the pileups but might be dropped from assembly for any number of reasons. The basic algorithm works | ||
* as follows: | ||
* - iterate over every pileup and count alt bases | ||
* - (unfinished) detect insertions overlapping this site (CURRENTLY ONLY WORKS FOR INSERTIONS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's unclear what is unfinished...do you want this to work for indels, but it only works for insertions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I consider the implementation of indels to be relatively untested compared to the SNP code. I'm going to change this to beta.
src/main/java/org/broadinstitute/hellbender/utils/pileup/PileupBasedAlleles.java
Outdated
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/utils/pileup/PileupBasedAlleles.java
Outdated
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/utils/pileup/PileupBasedAlleles.java
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/utils/pileup/PileupBasedAlleles.java
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/utils/pileup/PileupBasedAlleles.java
Show resolved
Hide resolved
if (args.badReadProperPair && !read.isProperlyPaired()) { | ||
return true; | ||
} | ||
if (args.badReadSecondary && (read.isSecondaryAlignment() || read.hasAttribute("SA"))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the doc for badReadSecondary there's no mention of secondary alignments....and yet, here it is...please make sure we are doing it right....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. The DRAGEN heuristic attempts to capture both (on the assumption that those messy reads are going to cause false positives). I'll update the documentation.
} | ||
|
||
|
||
//TODO this conversion is really unnecessary... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh? so why is it done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its done because the method SequenceUtil.calculateSamNmTag()
works on SAMRecords and its in HTSJDK so I can't easily change it... I didn't want to reimplement it but as you can see i have been forced to thread the header down to this class as a result.
src/main/java/org/broadinstitute/hellbender/utils/pileup/PileupBasedAlleles.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
//TODO add threshold descibed by illumina about insert size compared to the average | ||
if (args.templateLenghtStd > 0 && args.templateLengthMean > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
insert-size is a much better term than template length.....i think we should use that instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are already using samRecordForRead.getInferredInsertSize()
or do you mean we should change the title/arguments for this to use "insert size" as a name?
src/main/java/org/broadinstitute/hellbender/utils/pileup/PileupBasedAlleles.java
Outdated
Show resolved
Hide resolved
20 10098786 . C T 32.64 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.431;DP=9;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=54.65;MQRankSum=-2.200;QD=3.63;ReadPosRankSum=-1.383;SOR=0.132 GT:AD:DP:GQ:PL 0/1:7,2:9:40:40,0,217 | ||
20 10099220 . A G 598.64 . AC=1;AF=0.500;AN=2;BaseQRankSum=-1.664;DP=47;ExcessHet=3.0103;FS=1.137;MLEAC=1;MLEAF=0.500;MQ=57.74;MQRankSum=-2.596;QD=13.30;ReadPosRankSum=0.629;SOR=0.519 GT:AD:DP:GQ:PL 0/1:25,20:45:99:606,0,873 | ||
20 10099535 . G A 1356.64 . AC=1;AF=0.500;AN=2;BaseQRankSum=5.034;DP=68;ExcessHet=3.0103;FS=2.172;MLEAC=1;MLEAF=0.500;MQ=59.04;MQRankSum=-1.418;QD=20.87;ReadPosRankSum=-0.777;SOR=0.408 GT:AD:DP:GQ:PL 0/1:26,39:65:99:1364,0,686 | ||
20 10099565 . C T 1367.64 . AC=1;AF=0.500;AN=2;BaseQRankSum=4.595;DP=70;ExcessHet=3.0103;FS=8.736;MLEAC=1;MLEAF=0.500;MQ=59.40;MQRankSum=-1.266;QD=19.82;ReadPosRankSum=1.225;SOR=0.191 GT:AD:DP:GQ:PL 0/1:31,38:69:99:1375,0,994 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a utuilty method that indexes vcfs for testing, perhaps better to use it than to include indexes in the repository?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at that folder it looks like (with a few exceptions that look like they aren't actually used anywhere and should probably be removed themselves) every file also has the indexes checked in. I'm going to call it out of scope for this PR to change the HC tests over to index on the fly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done with this pass. let me know that you see my comments...they were only put on your last commit.
9670c5d
to
ce3a00c
Compare
@yfarjoun rebased (It was a bit messy its quite possible I broke something) and back to you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still see many unresolved reviewer comments from previous passes, but the tests that you added are good.
...oadinstitute/hellbender/tools/walkers/haplotypecaller/PileupDetectionArgumentCollection.java
Show resolved
Hide resolved
...institute/hellbender/tools/walkers/haplotypecaller/readthreading/ReadThreadingAssembler.java
Outdated
Show resolved
Hide resolved
.../broadinstitute/hellbender/tools/walkers/haplotypecaller/HaplotypeCallerIntegrationTest.java
Outdated
Show resolved
Hide resolved
601a8f3
to
5bfa0d3
Compare
@yfarjoun I think I have responded to everything in there and gotten rid of a bunch of the todos that were left (it turns out a bunch of them were vestigial from development). There is really only one of my comments I was unsure was worth fixing and I will probably ask around engine team for opinions about it. |
a319cd9
to
31e5862
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few small nits remain. thanks for moving this along!
final Haplotype hapD = new Haplotype("ACCTGAA".getBytes()); | ||
final Haplotype hapF = new Haplotype("GAAGAAG".getBytes()); // testing repeated kmers | ||
|
||
Map<Kmer, Integer> flatSupportAllKmers = new HashMap<Kmer, Integer>() { private static final long serialVersionUID = 0L; { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put the serialVersionUID
into a line of its own please (here and below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure... I'm still annoyed that i have to do this but hash-map is serializable so this anonymous subclass generates a compiler warning otherwise...
src/main/java/org/broadinstitute/hellbender/engine/AssemblyRegionIterator.java
Outdated
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/engine/AssemblyRegionIterator.java
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/engine/AssemblyRegionIterator.java
Outdated
Show resolved
Hide resolved
...va/org/broadinstitute/hellbender/tools/walkers/haplotypecaller/AssemblyBasedCallerUtils.java
Outdated
Show resolved
Hide resolved
...va/org/broadinstitute/hellbender/tools/walkers/haplotypecaller/AssemblyBasedCallerUtils.java
Outdated
Show resolved
Hide resolved
@@ -771,7 +771,7 @@ private boolean containsCalls(final CalledHaplotypes calledHaplotypes) { | |||
//TODO - why the activeRegion cannot manage its own one-time finalization and filtering? | |||
//TODO - perhaps we can remove the last parameter of this method and the three lines bellow? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these todos still relevant and correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no but they are old enough now maybe they are load bearing? There is a bunch of this commentary leftover from the first port of HaplotypeCaller back in the day that can probably go...
...oadinstitute/hellbender/tools/walkers/haplotypecaller/PileupDetectionArgumentCollection.java
Outdated
Show resolved
Hide resolved
...oadinstitute/hellbender/tools/walkers/haplotypecaller/PileupDetectionArgumentCollection.java
Show resolved
Hide resolved
...oadinstitute/hellbender/tools/walkers/haplotypecaller/PileupDetectionArgumentCollection.java
Outdated
Show resolved
Hide resolved
d8aeafb
to
1c6dc21
Compare
…e place it was before despite the GGA code having been moved). And fixed a broken integration test
addef19
to
ec287d8
Compare
@yfarjoun That was a messy rebase. Assuming these tests pass everything should be in order for you to take another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a small nit in one of the comments....thanks!
Its my own review and I have made the changes necessary
…ore assembly that supplements the assembly variants with variants that show up in the pileups. (broadinstitute#7432)
This is the beta version of the pileup based haplotype generation and variant detection code.