Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FuncotationMap refactoring and VCF/MAF concordance for protein changes #4838

Merged
merged 4 commits into from
Jun 8, 2018

Conversation

LeeTL1220
Copy link
Contributor

@LeeTL1220 LeeTL1220 commented May 31, 2018

Closes #3842
Closes #4808
Closes #4811
Closes #4810
Closes #4839

  • A lot of refactoring to create FuncotationMap and use that to send to OutputRenderers.
  • VCF and MAF will honor the canonical vs. all vs best effect.
  • MAF entries will now be rendered as AnnotatedInterval via tribble

@LeeTL1220 LeeTL1220 requested a review from jonn-smith May 31, 2018 19:52
@LeeTL1220
Copy link
Contributor Author

@jonn-smith Please review that I have tested to your satisfaction. Particularly around FuncotationMap. Please note that several issues are closed with this PR.

@codecov-io
Copy link

codecov-io commented May 31, 2018

Codecov Report

Merging #4838 into master will increase coverage by 0.062%.
The diff coverage is 87.925%.

@@               Coverage Diff               @@
##              master     #4838       +/-   ##
===============================================
+ Coverage     80.353%   80.416%   +0.062%     
- Complexity     17714     17811       +97     
===============================================
  Files           1088      1089        +1     
  Lines          63985     64133      +148     
  Branches       10313     10340       +27     
===============================================
+ Hits           51414     51573      +159     
+ Misses          8558      8502       -56     
- Partials        4013      4058       +45
Impacted Files Coverage Δ Complexity Δ
...itute/hellbender/tools/funcotator/Funcotation.java 33.333% <ø> (ø) 3 <0> (ø) ⬇️
...ools/funcotator/FuncotatorArgumentDefinitions.java 86.364% <ø> (-0.87%) 1 <0> (ø)
...te/hellbender/tools/funcotator/OutputRenderer.java 100% <ø> (ø) 2 <0> (ø) ⬇️
...tils/annotatedinterval/AnnotatedIntervalCodec.java 79.31% <0%> (ø) 18 <2> (+1) ⬆️
...org/broadinstitute/hellbender/engine/GATKTool.java 90.667% <0%> (-0.405%) 99 <0> (ø)
.../tools/funcotator/mafOutput/MafOutputRenderer.java 95.64% <100%> (+0.349%) 51 <12> (+3) ⬆️
...cotator/dataSources/vcf/VcfFuncotationFactory.java 87.013% <100%> (-2.32%) 25 <0> (+1)
...nder/tools/funcotator/TranscriptSelectionMode.java 89.72% <100%> (-0.757%) 1 <0> (ø)
.../tools/funcotator/vcfOutput/VcfOutputRenderer.java 85.246% <100%> (+3.428%) 12 <3> (+3) ⬆️
.../tools/funcotator/dataSources/DataSourceUtils.java 63.241% <100%> (ø) 40 <0> (ø) ⬇️
... and 17 more

Copy link
Collaborator

@jonn-smith jonn-smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions and requests.

@@ -821,6 +821,17 @@ private String createProgramGroupID(final SAMFileHeader header) {
*/
protected String getToolkitShortName() { return "GATK"; }

/**
* Call {@link GATKTool#addFeatureInputsAfterInitialization(String, String, Class, int)} with no caching.
*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add javadoc for the args here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
else {
return createDefaultFuncotationsOnVariant(variant, referenceContext);
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works fine, but why rearrange it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old code could have outputFuncotations of length 0 and return it, but we would have wanted a default funcotation.

This happened when all of the featureList (i.e. transcripts) were IGR, since we filter out the IGR transcripts (a modification I made later per our offline discussion)
No action.

public static FuncotationMap createFromGencodeFuncotations(final List<GencodeFuncotation> gencodeFuncotations) {

Utils.nonNull(gencodeFuncotations);
Utils.validateArg(!areDuplicateTranscriptIDsFound(gencodeFuncotations), "Duplicate transcript ID entries were found in input: " +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing - are you searching for duplicates or that there were no duplicates?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ensuring that there were no duplicates.

I updated the javadoc comment.

Done.

* @param txId Never {@code null}
* @param funcotations Never {@code null}
*/
public void add(final String txId, final List<Funcotation> funcotations) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per discussion in person, reminder to throw an exception if you add a GencodeFuncotation here (for now).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done and issue filed.

Utils.nonNull(txId);
Utils.nonNull(funcotations);

final LinkedHashSet<Funcotation> existingFuncotationsToUpdate = txToFuncotations.getOrDefault(txId, new LinkedHashSet<>());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be faster to add the new funcotations directly to the mapped list ala containsKey and ...get(txId).append(...)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it is the same.

No action.

final String[] funcotationKeys = extractFuncotatorKeysFromHeaderDescription(funcotationHeaderLine.getDescription());

for (final String annotationToCheck: annotationsToCheck) {
final List<String> mafProteinChanges = maf.getRecords().stream().map(v -> v.getAnnotationValue(MafOutputRendererConstants.FieldName_Protein_Change)).collect(Collectors.toList());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like you can pull this out of the loop...

This is confusing - you create a list of mafProteinChanges but it doesn't depend on the annotation. How does the assert work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. I forgot to make a second list for MAF annotation names that corresponds to the VCF.

Done

final FuncotationMap funcotationMap = alleleToFuncotationMap.values().iterator().next();
Assert.assertEquals(funcotationMap.keyList().size(), 1);
Assert.assertTrue(funcotationMap.keyList().stream().noneMatch(k -> k.equals(FuncotationMap.NO_TRANSCRIPT_AVAILABLE_KEY)));
Assert.assertTrue(funcotationMap.keyList().stream().noneMatch(k -> StringUtils.isEmpty(k)));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can change this to StringUtils::isEmpty

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -165,19 +165,11 @@ private VariantContext createVariantContext(final String contig,
Allele.create("C"), FACTORY_NAME)
)
),
// Three overlapping VCF features:
// No overlapping VCF features, since there are no indels in dbSNP (the test datasource), so the ground truth should be a default entry, which was constructed here manually:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add in a test case here that has 3 overlapping features?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I clarified the comment. Three overlap by position... none overlap if you include ref and alt in the "overlap".

Done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm assuming that you do not want me to create a VCF datasource with three identical records).


private FuncotationMap() {}

private static boolean isGencodeFuncotation(final Funcotation f) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it would make more sense in FuncotatorUtils

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added tests, too.

// Create a place to keep our funcotations:
final List<Funcotation> funcotations = new ArrayList<>();
// Create only the gencode funcotations.
if (retriveGencodeFuncotationFactoryStream().count() > 1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, this isn't the case anymore - you can have as many Gencode data sources as you want, so this may need to be changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry about errors in the future... such as what do we do if two gencode datasources have the same transcript?

I changed the message to something less ominous: "Attempting to annotate with more than one GENCODE datasource. If these have overlapping transcript IDs, errors may occur."

Done

@jonn-smith jonn-smith assigned LeeTL1220 and unassigned jonn-smith Jun 5, 2018
@LeeTL1220
Copy link
Contributor Author

@jonn-smith Back at you. I asked a few questions related to some of my responses.

@LeeTL1220 LeeTL1220 assigned jonn-smith and unassigned LeeTL1220 Jun 6, 2018
Copy link
Collaborator

@jonn-smith jonn-smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor request here and one in response to prior comments. About to check latest commit.

}


//TODO: Duplicate code
Copy link
Collaborator

@jonn-smith jonn-smith Jun 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LeeTL1220 is this TODO addressed?

Nevermind - disregard this.

Copy link
Collaborator

@jonn-smith jonn-smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor request in the responses to your initial comments, then it looks good.

@jonn-smith jonn-smith assigned LeeTL1220 and unassigned jonn-smith Jun 7, 2018
@jonn-smith
Copy link
Collaborator

@LeeTL1220 Back to you for a minor request.

LeeTL1220 added 4 commits June 7, 2018 21:50
…mparisons. This is useful for dev testing.

- Added parsing of funcotation attribute using the VCF Header.  Mostly used for dev testing.
- VCF Datasources now have to match the alt and ref alleles to be condsidered a hit.
- Fixing #4808
- Added FuncotationMap to improve encapsulation and make rendering a bit easier.
- Fixed issue with FuncotatorIntegrationTest where default values were references that were changed, thereby causing cross-test errors.
- A lot of refactoring to create FuncotationMap and use that to send to OutputRenderers.
- VCF and MAF will honor the canonical vs. all vs best effect.
@LeeTL1220 LeeTL1220 force-pushed the ll_refactoring_for_tx_all_mode branch from 30f7ad3 to 48e6443 Compare June 8, 2018 01:51
Copy link
Collaborator

@jonn-smith jonn-smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good (pending test success).

@LeeTL1220 LeeTL1220 merged commit 7e5d4c7 into master Jun 8, 2018
@LeeTL1220 LeeTL1220 deleted the ll_refactoring_for_tx_all_mode branch June 8, 2018 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants