Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CH #101

Open
evanbiederstedt opened this issue Apr 2, 2019 · 5 comments
Open

CH #101

evanbiederstedt opened this issue Apr 2, 2019 · 5 comments
Assignees
Labels
backburner probably won't address in a near future enhancement New feature or request

Comments

@evanbiederstedt
Copy link
Contributor

evanbiederstedt commented Apr 2, 2019

Clonal Hematopoiesis, but it's too tricky to spell. :)

Here's how we are going to do this:

Rough idea: we swap the TN labels, do unmatched variant calling on the normal, then genotype the tumor. Some CH mutations will be present in the tumor because of blood contamination and unmatched calling ensures we don’t miss those

Once there are analysis-ready TN pairs, we execute CH as follows:

CH is running Mutect2 and Strelka2, but with the normal-tumor labels switched.

For filtering artifacts/false positives (from Clinical Bioinformatics):

  1. gnomAD
  2. genotyping values from panel of blood samples: 300 CH-free and young patients
  3. more of a work in progress, filtering SNPs with LOH in tumor using FACETS data (this will only be important for high VAF variants in blood

We will ask them for input, and possible they could share code.

@evanbiederstedt evanbiederstedt added the enhancement New feature or request label Apr 2, 2019
@kpjonsson
Copy link
Member

This will add a lot of runtime due to Mutect, so the question is if we think only using Strelka2 would suffice. Might require some investigation.

@evanbiederstedt
Copy link
Contributor Author

Might require some investigation.

Precisely. Once implemented, we can investigate

@evanbiederstedt evanbiederstedt added the backburner probably won't address in a near future label Apr 19, 2019
@evanbiederstedt
Copy link
Contributor Author

I think this should be the last thing we try.

I don't think CH has yet been really defined for WES/WGS data here, and would require a good deal of iteration to find the optimal solution.

--- There are some concerns (based on anecdotes from Ryan Ptashkin) that MuTect2 and Strelka2 filter out low VAF calls, which would be necessary for CH works. "Since the CH variants tend to live at the lower end of VAF range (increasing with Tx and /or age) I’d be concerned about high FDR at lower VAFs with Mutect2, but that is just limited data that i have seen". That's possible. One option (mentioned by Ryan) was to run Vardict! Many for benchmarking, but I think that's a bit of a bad idea. I'd rather entertain LoFreq2: https://github.com/CSB5/lofreq

There's also a question about whether we should take the union of these calls, or the intersection?

--- It appears no one has tried TN paired variant calling for this analysis. (I guess I misunderstood.) Calling against a matched tumor is bad, as there are plenty of tumor samples that have blood/lymphocyte infiltration within, i.e. contamination.

--- The best approach would be to call blood normal vs. a curated pooled normal of young patients without any hematological malignancies. We do not yet have a curated "normal" though. In order to create this, "use data from the youngest patients that you have, but check that they didnt have an active heme malignancy at time of sequencing and could even genotype for the most common blood mutations to exclude samples with obvious somatic mutations in blood".

I'm guessing what we'll end up doing here for the first official release is try to generate the analysis outputs necessary in order to converge on the best solution.

This was referenced Apr 19, 2019
@evanbiederstedt
Copy link
Contributor Author

Update: @ahmetz is working on a PoN for WES data

@evanbiederstedt
Copy link
Contributor Author

I would try this caller:
https://hub.docker.com/r/lethalfang/lofreq

There's even a Dockerfile here: https://hub.docker.com/r/seandavi/lofreq/dockerfile

Vardict is going to be a pain, but it's possible: https://hub.docker.com/r/marghoob/vardict

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backburner probably won't address in a near future enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants