-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Numerical consistency between python and java, transpose fix #4652
Conversation
} | ||
while (readIt.hasNext()) { | ||
sb.append(GATKReadToString(readIt.next())); | ||
GATKRead r = readIt.next(); | ||
if (r.getReadGroup().toLowerCase().contains("artificial")){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lucidtronix Whats the reason for this change ? It seems to be casting a pretty wide net. HC-generated artificial read groups ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah exactly, we've always excluded them from the read tensor. But I agree, I will add an argument to toggle this.
Codecov Report
@@ Coverage Diff @@
## master #4652 +/- ##
===============================================
+ Coverage 79.852% 79.972% +0.121%
- Complexity 17336 17710 +374
===============================================
Files 1074 1076 +2
Lines 62938 63946 +1008
Branches 10186 10496 +310
===============================================
+ Hits 50257 51139 +882
- Misses 8701 8781 +80
- Partials 3980 4026 +46
|
} | ||
while (readIt.hasNext()) { | ||
sb.append(GATKReadToString(readIt.next())); | ||
GATKRead r = readIt.next(); | ||
if (!keepArtificialReadGroup && r.getReadGroup().toLowerCase().contains("artificial")){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I was thinking it was overly broad because it matches any group name containing the string "artificial", rather than because of lack of command line control. Instead of doing this manually, I would add an override of getDefaultReadFilters that adds a ReadGroupBlackListReadFilter initialized with HaplotypeBamWriter.DEFAULT_HAPLOTYPE_READ_GROUP_ID (which is the actual string used by HC - will have to be made public), prefixed with the string "ID:". That will provide an exact match to the synthetic HC read group, the engine will filter the reads out automatically so the tool never sees them, and it can be disabled using the existing builtin argument --disable-read-filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I also had to make the read group black list argument optional and for gatk3 compatibility also filter "ArtificialHaplotype" read groups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of minor changes requested.
public ReadGroupBlackListReadFilter(final List<String> blackLists, final SAMFileHeader header) { | ||
super.setHeader(header); | ||
this.blackList = blackLists; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would copy the contents (this.blackList.addAll(blackLists)
) rather than just the reference, in case the input changes or is immutable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -208,6 +207,16 @@ protected VariantFilter makeVariantFilter(){ | |||
} | |||
} | |||
|
|||
@Override | |||
public List<ReadFilter> getDefaultReadFilters() { | |||
List<ReadFilter> readFilters = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This list should be initialized with the values returned from super.getDefaultReadFilters
to make sure you get the engine defaults like WellFormedReadFilter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
List<ReadFilter> readFilters = new ArrayList<>(); | ||
List<String> filterList = new ArrayList<>(); | ||
filterList.add("ID:" + HaplotypeBAMWriter.DEFAULT_HAPLOTYPE_READ_GROUP_ID); | ||
filterList.add("ID:ArtificialHaplotype"); // GATK3 Compatibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a DEFAULT_GATK3_HAPLOTYPE_READ_GROUP_ID constant to HaplotypeBAMWriter with a comment saying its for use with backward compatible read group read filters, and reference that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@lucidtronix This looks good now. Can you squash it down to one commit and rebase and then we can merge once the tests pass. |
10c0d87
to
9945a5c
Compare
This PR brings pure python inference and java-python streaming inference into much better agreement. also a transpose bugfix.