-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFE: Use germline resources in MuTect2 to reduce artifacts in tumor-only mode #2873
Comments
The file in question ( |
Thanks! I think that would be a good improvement. Can you point me to where you downloaded that file? |
It's available at ftp://[email protected]/bundle/Mutect2/ (that is, a specific sublocation of the GATK resource bundle). The two files are:
|
I wonder if it'd be feasible to actually process our gnomAD files to produce something that is palatable for the GATK. That would avoid maintaining yet another resource. |
Thanks-- if the Broad's files are kind of small, it might be easier to just use their preprocessed files. Our gnomAD prep takes hours-- @naumenko-sa do you see that as well? I think |
I haven't had yet the opportunity to test them, I think they're fairly minimal. A quick
Hence I guess it uses just |
I wonder if we need to do anything at all-- we have those fields in the gnomAD files so we can probably just feed those in directly, if GATK is parsing the file at all reasonably. |
Indeed. I'll try to feed one of the gnomAD files I have and see if it's processed correctly. |
Thanks Luca! |
https://github.com/bcbio/bcbio-nextgen/blob/master/bcbio/variation/vcfanno.py#L154 has a function find either exac or gnomAD, so I'd make a version of that that just checked for gnomad_exome. |
@roryk |
Hello, I think we are leaning to not implementing this. I talked with Brad about it and he had a good practical take. The reasoning behind not implementing it is that if we filter them while we are calling them, the variants completely disappear and cannot be recovered. If we annotate them as being in gnomAD, then later on people can decide what they want to do with them-- whenever we add filtering, eventually folks will complain they are missing variants they are expecting to see. If we annotate then at least they can see why we filtered them, if we filter them, since at some point the variants will exist. This will slow down the mutect2 calls, since it will be caling in places it would have skipped, is the downside. |
…t2 and purecn related to bcbio#2873
better later than never :) we need this resource for t-only mutect2 and purecn |
@naumenko-sa, just looking at this, are these commits just adding the af_gnomad file or are you considering implementing the germline resources option for Mutect? |
yes, I'm pushing germline resources in mutect as well. |
I thought this would be interesting to have, as described in:
https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_hellbender_tools_walkers_mutect_Mutect2.php
However it needs yet another potential duplicate file which is part of the GATK resource bundle (and we already have gnomAD). I think however that the option might be very useful for tumor-only analyses to remove sources from being called (they are prefiltered) also reducing runtime times.
Thoughts?
The text was updated successfully, but these errors were encountered: