-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed an issue of data that contain tp tag sometimes recognized as flow #8337
Fixed an issue of data that contain tp tag sometimes recognized as flow #8337
Conversation
Thanks @ilyasoifer ! Was the test bam you added aligned with minimap2? If not, we should make sure to add at least one simple HaplotypeCaller regression test that takes an actual minimap2-aligned input. |
@droazen the bam that I added was aligned with minimap2. However, it is a really small BAM, so I could not run it through the integration tests. It sounds useful to add minimap2-aligned bam to HaplotypeCallerIntegrationTest. Can you or @jamesemery point me to a suitable BAM for that? |
@ilyasoifer do you have the means to quickly run minimap2? I would recommend simply realigning src/test/resources/large/CEUTrio.HiSeq.WGS.b37.NA12878.20.21.bam with minimap2 and making a quick "are tests consistent with pervious versions" test with a checked in vcf output. I don't know how to wrangle minimap2 to handle mates correctly however so i don't know if this is easy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ilyasoifer Thank you for doing this. A few more defensive checks and future-proofing to hopefully save us from having this headache again in the future. Could you realign our HaplotypeCaller input bam snippets with minimap 2 and make a quick integration test asserting that they still funciton?
if (readGroup.getPlatform()==null){ | ||
isFlowPlatform = false; | ||
} else { | ||
isFlowPlatform = NGSPlatform.valueOf(readGroup.getPlatform()).getSequencerType() == SequencerFlowClass.FLOW; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know these SequencerFlowClass objects exist and in theory should handle other techonologies, but the codepaths here really only support Ultima. I would change this to check for the Ultima readgroup (so treat potential Iontorrent or 454 data which is also flow) seperately for now so long as they aren't tested. This seems like a recipe for those technologies falling over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Old Ultima data used PL:LS454 because ULTIMA was not a legal platform. I think I prefer allowing LS454 and ULTIMA to work this way given low chance anyone is now running LS454 data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you special case the PL:LS454 platform and give an explanation about it here? Again, I want to be as specific as possible about the inputs here since this is a greedy non opt-in behavior we should be careful we don't start clobbering somebodies inputs in all cases. If it IS labeled as 454 data can you check for the other fields that should make it readable as ultima data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added tests
isFlowPlatform = NGSPlatform.valueOf(readGroup.getPlatform()).getSequencerType() == SequencerFlowClass.FLOW; | ||
} | ||
if (isFlowPlatform) { | ||
Utils.nonNull(readGroup); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this is probably the first line of defense for users submitting mis-formatted flow based data to the HaplotypeCaller, can you upgrade these non-null checks to be a little more stringent about expected flow-based header lines and throw exceptions with UserException enforcing the presence of those fields and printing out the readgroup header line in the error message? (and add a quick unit test for the flow platform construction)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I fixed this, can you check?
SAMReadGroupRecord rg1 = header.getReadGroup("test"); | ||
SAMRecord read = reader.iterator().next(); | ||
GATKRead gread = new SAMRecordToGATKReadAdapter(read); | ||
assert(FlowBasedReadUtils.hasFlowTags(gread)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you record the actual readgroups from that file in a comment here and add a bit of explanation to this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
final byte minTailQualityToUse = errorCorrectReads ? HaplotypeCallerEngine.MIN_TAIL_QUALITY_WITH_ERROR_CORRECTION : minTailQuality; | ||
|
||
final List<GATKRead> readsToUse = new ArrayList<>(); | ||
final List<GATKRead> hardClippedReadsToUse = new ArrayList<>(); | ||
|
||
for (final GATKRead originalRead : region.getReads()) { | ||
|
||
boolean isFlowBasedData = false; | ||
if (FlowBasedReadUtils.hasFlowTags(originalRead)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be condensed into a single method so external tools don't have to know about the readgroup info caching? Can you make a method FlowBasedReadUtils.isThisFlowBased()
that extracts this entire operation away and handles your readgroup caching internally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@droazen , @jamesemery - I think I addressed your comments, back to you now. In any case, I think that the check is more robust now, and it was a good fix. |
Added test like this |
// we require also FO field to consider ReadGroup to come from the flow | ||
boolean tmp = (NGSPlatform.fromReadGroupPL(readGroup.getPlatform()) == NGSPlatform.LS454) || | ||
(NGSPlatform.fromReadGroupPL(readGroup.getPlatform()) == NGSPlatform.ULTIMA); | ||
tmp = tmp & (readGroup!=null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
readGroup!=null here after referencing it several times above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments and one test update and then i think it looks good. Thank you for the quick turnaround on fixing this!
return rec.hasAttribute(FlowBasedRead.FLOW_MATRIX_TAG_NAME) | ||
|| rec.hasAttribute(FlowBasedRead.FLOW_MATRiX_OLD_TAG_KR) | ||
|| rec.hasAttribute(FlowBasedRead.FLOW_MATRiX_OLD_TAG_TI); | ||
|
||
} | ||
|
||
public static boolean isFlowPlatform(final SAMFileHeader hdr, final GATKRead read) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just add a javadoc to this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And also possibly update the class javadoc to point people to this method as the definitive "is this flow based and should i trigger that code" method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
/* | ||
* Test that minimap2 data are supported and consistent with past results | ||
*/ | ||
public void testVCFModeMinimap2IsConsistentWithPastResults() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't get run without adding the @test tag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
tests.add(new Object[]{hdr, read3, false}); | ||
|
||
GATKRead read4 = makeRead(new byte[]{'T','A','G','C','G','A'}, false); | ||
read4.setAttribute("tp",new byte[6]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test looks good, can you add in a check for the special cases where Platform=454 +FO tag present and Platform=454 + no FO tag present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
isFlowPlatform = tmp; | ||
|
||
//in addition validate that the ultima data has FO tag, otherwise break | ||
if ((NGSPlatform.fromReadGroupPL(readGroup.getPlatform()) == NGSPlatform.ULTIMA) & (!isFlowPlatform)){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the logic here is a little confusing though I believe it accomplishes what you want. You might consider refactoring this a little so its cleaner but its not critical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments and one test update and then i think it looks good. Thank you for the quick turnaround on fixing this!
@jamesemery I believe I addressed your comments, PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One FINAL gripe and then its good to go. Thank you for going back and forth on this.
isFlowPlatform = false; | ||
} else if (NGSPlatform.fromReadGroupPL(readGroup.getPlatform())==NGSPlatform.UNKNOWN){ | ||
isFlowPlatform = false; | ||
} else if (NGSPlatform.fromReadGroupPL(readGroup.getPlatform()) == NGSPlatform.LS454) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know 454 data is ancient at this point but i would still prefer this LS454 case to explicitly check for
if (readGroup.getFlowOrder()!=null) {
isFlowPlatform = true; else false;
I'm not sure i trust the Ultima flow ordering code is going to handle 454 data well without testing it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not this check is supposed to do. Old Ultima data were labeled LS454 and initially did not have the FO tag. Afterwards the tag was added .
The full check is done by isFlowPlatform function that also verifies that the reads have tp tag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh i see... horrifying... well i'm alright to let this through in this case then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One FINAL gripe and then its good to go. Thank you for going back and forth on this.
5395d75
to
5ce230a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright this should be good 👍
Thanks for the help! |
Is this fixed in version 4.4.0.0 of gatk? |
@AishaShah No, this fix will be part of the next GATK release, which we expect to go out shortly (within the next several weeks). |
Hi! I know this issue is closed and a solution has been found but I am facing this same issue right now and wanted to follow up and see if it's been patched in the current 4.4.0.0 version of gatk? Or is the new release expected to be published soon? |
Fixes #8335