-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FuncotationMap refactoring and VCF/MAF concordance for protein changes #4838
Conversation
@jonn-smith Please review that I have tested to your satisfaction. Particularly around |
5217116
to
43c5098
Compare
Codecov Report
@@ Coverage Diff @@
## master #4838 +/- ##
===============================================
+ Coverage 80.353% 80.416% +0.062%
- Complexity 17714 17811 +97
===============================================
Files 1088 1089 +1
Lines 63985 64133 +148
Branches 10313 10340 +27
===============================================
+ Hits 51414 51573 +159
+ Misses 8558 8502 -56
- Partials 4013 4058 +45
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions and requests.
@@ -821,6 +821,17 @@ private String createProgramGroupID(final SAMFileHeader header) { | |||
*/ | |||
protected String getToolkitShortName() { return "GATK"; } | |||
|
|||
/** | |||
* Call {@link GATKTool#addFeatureInputsAfterInitialization(String, String, Class, int)} with no caching. | |||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add javadoc for the args here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
else { | ||
return createDefaultFuncotationsOnVariant(variant, referenceContext); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works fine, but why rearrange it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old code could have outputFuncotations of length 0 and return it, but we would have wanted a default funcotation.
This happened when all of the featureList (i.e. transcripts) were IGR, since we filter out the IGR transcripts (a modification I made later per our offline discussion)
No action.
public static FuncotationMap createFromGencodeFuncotations(final List<GencodeFuncotation> gencodeFuncotations) { | ||
|
||
Utils.nonNull(gencodeFuncotations); | ||
Utils.validateArg(!areDuplicateTranscriptIDsFound(gencodeFuncotations), "Duplicate transcript ID entries were found in input: " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is confusing - are you searching for duplicates or that there were no duplicates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ensuring that there were no duplicates.
I updated the javadoc comment.
Done.
* @param txId Never {@code null} | ||
* @param funcotations Never {@code null} | ||
*/ | ||
public void add(final String txId, final List<Funcotation> funcotations) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per discussion in person, reminder to throw an exception if you add a GencodeFuncotation
here (for now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done and issue filed.
Utils.nonNull(txId); | ||
Utils.nonNull(funcotations); | ||
|
||
final LinkedHashSet<Funcotation> existingFuncotationsToUpdate = txToFuncotations.getOrDefault(txId, new LinkedHashSet<>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be faster to add the new funcotations directly to the mapped list ala containsKey
and ...get(txId).append(...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it is the same.
No action.
final String[] funcotationKeys = extractFuncotatorKeysFromHeaderDescription(funcotationHeaderLine.getDescription()); | ||
|
||
for (final String annotationToCheck: annotationsToCheck) { | ||
final List<String> mafProteinChanges = maf.getRecords().stream().map(v -> v.getAnnotationValue(MafOutputRendererConstants.FieldName_Protein_Change)).collect(Collectors.toList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like you can pull this out of the loop...
This is confusing - you create a list of mafProteinChanges
but it doesn't depend on the annotation. How does the assert work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. I forgot to make a second list for MAF annotation names that corresponds to the VCF.
Done
final FuncotationMap funcotationMap = alleleToFuncotationMap.values().iterator().next(); | ||
Assert.assertEquals(funcotationMap.keyList().size(), 1); | ||
Assert.assertTrue(funcotationMap.keyList().stream().noneMatch(k -> k.equals(FuncotationMap.NO_TRANSCRIPT_AVAILABLE_KEY))); | ||
Assert.assertTrue(funcotationMap.keyList().stream().noneMatch(k -> StringUtils.isEmpty(k))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can change this to StringUtils::isEmpty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -165,19 +165,11 @@ private VariantContext createVariantContext(final String contig, | |||
Allele.create("C"), FACTORY_NAME) | |||
) | |||
), | |||
// Three overlapping VCF features: | |||
// No overlapping VCF features, since there are no indels in dbSNP (the test datasource), so the ground truth should be a default entry, which was constructed here manually: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add in a test case here that has 3 overlapping features?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I clarified the comment. Three overlap by position... none overlap if you include ref and alt in the "overlap".
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'm assuming that you do not want me to create a VCF datasource with three identical records).
|
||
private FuncotationMap() {} | ||
|
||
private static boolean isGencodeFuncotation(final Funcotation f) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it would make more sense in FuncotatorUtils
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Added tests, too.
// Create a place to keep our funcotations: | ||
final List<Funcotation> funcotations = new ArrayList<>(); | ||
// Create only the gencode funcotations. | ||
if (retriveGencodeFuncotationFactoryStream().count() > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking, this isn't the case anymore - you can have as many Gencode
data sources as you want, so this may need to be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry about errors in the future... such as what do we do if two gencode datasources have the same transcript?
I changed the message to something less ominous: "Attempting to annotate with more than one GENCODE datasource. If these have overlapping transcript IDs, errors may occur."
Done
@jonn-smith Back at you. I asked a few questions related to some of my responses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor request here and one in response to prior comments. About to check latest commit.
} | ||
|
||
|
||
//TODO: Duplicate code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LeeTL1220 is this TODO addressed?
Nevermind - disregard this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor request in the responses to your initial comments, then it looks good.
@LeeTL1220 Back to you for a minor request. |
…mparisons. This is useful for dev testing. - Added parsing of funcotation attribute using the VCF Header. Mostly used for dev testing. - VCF Datasources now have to match the alt and ref alleles to be condsidered a hit. - Fixing #4808 - Added FuncotationMap to improve encapsulation and make rendering a bit easier. - Fixed issue with FuncotatorIntegrationTest where default values were references that were changed, thereby causing cross-test errors. - A lot of refactoring to create FuncotationMap and use that to send to OutputRenderers. - VCF and MAF will honor the canonical vs. all vs best effect.
30f7ad3
to
48e6443
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good (pending test success).
Closes #3842
Closes #4808
Closes #4811
Closes #4810
Closes #4839
FuncotationMap
and use that to send to OutputRenderers.AnnotatedInterval
via tribble