-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CH #101
Comments
This will add a lot of runtime due to Mutect, so the question is if we think only using Strelka2 would suffice. Might require some investigation. |
Precisely. Once implemented, we can investigate |
I think this should be the last thing we try. I don't think CH has yet been really defined for WES/WGS data here, and would require a good deal of iteration to find the optimal solution. --- There are some concerns (based on anecdotes from Ryan Ptashkin) that MuTect2 and Strelka2 filter out low VAF calls, which would be necessary for CH works. "Since the CH variants tend to live at the lower end of VAF range (increasing with Tx and /or age) I’d be concerned about high FDR at lower VAFs with Mutect2, but that is just limited data that i have seen". That's possible. One option (mentioned by Ryan) was to run Vardict! Many for benchmarking, but I think that's a bit of a bad idea. I'd rather entertain LoFreq2: https://github.com/CSB5/lofreq There's also a question about whether we should take the union of these calls, or the intersection? --- It appears no one has tried TN paired variant calling for this analysis. (I guess I misunderstood.) Calling against a matched tumor is bad, as there are plenty of tumor samples that have blood/lymphocyte infiltration within, i.e. contamination. --- The best approach would be to call blood normal vs. a curated pooled normal of young patients without any hematological malignancies. We do not yet have a curated "normal" though. In order to create this, "use data from the youngest patients that you have, but check that they didnt have an active heme malignancy at time of sequencing and could even genotype for the most common blood mutations to exclude samples with obvious somatic mutations in blood". I'm guessing what we'll end up doing here for the first official release is try to generate the analysis outputs necessary in order to converge on the best solution. |
Update: @ahmetz is working on a PoN for WES data |
I would try this caller: There's even a Dockerfile here: https://hub.docker.com/r/seandavi/lofreq/dockerfile Vardict is going to be a pain, but it's possible: https://hub.docker.com/r/marghoob/vardict |
Clonal Hematopoiesis, but it's too tricky to spell. :)
Here's how we are going to do this:
Rough idea: we swap the TN labels, do unmatched variant calling on the normal, then genotype the tumor. Some CH mutations will be present in the tumor because of blood contamination and unmatched calling ensures we don’t miss those
Once there are analysis-ready TN pairs, we execute CH as follows:
CH is running Mutect2 and Strelka2, but with the normal-tumor labels switched.
For filtering artifacts/false positives (from Clinical Bioinformatics):
We will ask them for input, and possible they could share code.
The text was updated successfully, but these errors were encountered: