-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validation of large deletions calling in bcbio with HuRef benchmark compared to ERDS and CNVNator #2313
Comments
Sergey; As a more general discussion, what issues are you finding with the current CNVkit caller that ERDS and CNVNator improve on? What type of sample inputs are you dealing with (WGS, exome, tumor...). Do you have any test cases we could incorporate to help motivate both improvements to CNVkit as well as incorporation of new methods? We're always open to including new tools but try to do this in a way where we cover gaps and understand the strengths and weaknesses of approaches. Thanks again for the discussion. |
Hi Brad! We are mostly focused on the germline CNV calling using WGS. @bjtrost will post more details soon. Sergey |
Hi Brad, Although I have not tested or used it myself, my understanding from reading the CNVkit documentation and paper abstract is that it is primarily designed for performing tumour-normal comparisons using pooled references from capture data. It would also be useful to include CNV-detection algorithms designed for germline WGS data and work on single simples (the "single sample" aspect is desirable because, in our experience, the use of pooled references can result in deletion calls where there are common duplications and vice-versa). In a recent paper, we describe a workflow for germline CNV detection from WGS data. Part of this paper is comprised of an evaluation of six read depth-based CNV callers, and we identified ERDS and CNVnator as the best ones. Thus, I think it would be valuable to add ERDS and CNVnator to bcbio for the purposes of performing germline CNV calling from WGS data. Our paper is here: https://www.sciencedirect.com/science/article/pii/S0002929717304962?via%3Dihub Cheers, Brett |
It looks like the main improvement ERDS and CNVnator offer over CNVkit is the segmentation algorithm. CNVkit uses CBS by default, which is reliable for panel and exome data, whereas:
Other recently developed callers like CLAMMS and GATK4 were not evaluated -- both of those use HMMs for segmentation, so I'd be interested to see how they stack up. CNVkit also recently rolled out an HMM-based segmentation method, but it hasn't been evaluated thoroughly yet. Other users report better results with CNVkit's "flasso" (fused lasso) method over the default CBS, so if there is a certain WGS benchmark dataset that is relevant to a lot of users, I'd be happy to work with someone to tune CNVkit's segmentation to work better on that dataset. |
Hi, Just wanted to note that CLAMMS is only for exome data ("Please note that CLAMMS is not intended to be used with whole-genome sequencing data"), and the CNV-detection functionality in GATK4 seems to be just for somatic, although they are testing functionality for germline (https://gatkforums.broadinstitute.org/gatk/discussion/11344/current-status-of-gatk4-germlinecnvcaller-tools-and-best-practices). Cheers, Brett |
Brett, Sergey and Eric; Is it possible to extract a couple of the test cases from that paper that we could use as a starting point for evaluating CNV calls on single sample germline calls? I'd really like to have this included in the bcbio validations so we can consistently evaluate. Sergey, is that validation done against the NA12878 calls? That's not a great test case for CNVs unfortnately, since it's all based on split/paired style callers (hence their better performance). I suspect other read depth based callers would also not look great on small events or on specificity. I've been hoping to develop a better evaluation metric for this and would love to have this tie in with Brett's paper so we can compare/contrast to the work that is already done. I'd be keen to explore the GATK4 in progress HMM approaches or adjusting CNVkit as first passes with this validation in place. Practically, ERDS has an academic only license and is only for WGS, and CNVnator does not seem to be under active development. As you mention CLAMMS is exome only. Since it's a ton of work to integrate and support methods for the long term, we'd ideally picks ones we can work with the authors on like we currently do with Eric and CNVkit. If adjusting CNVkit or integrating GATK4 would get us equivalent calls on some of your benchmarks that would be a best case outcome. What do y'all think about this as a plan? I'd be happy to hear your feedback and thoughts. |
Brad, thanks for the clarification! Yes, I've validated with giab NA12878 calls - a standard SV validation in bcbio, i.e. we need a benchmark not biased towards split/paired methods. So the potential roadmap is to use one of the benchmarks referenced in the Brett's article (HuRef, NA12878, AK1), to validate Eric's CNVkit in flasso mode and show that it is comparable to ERDS and CNVnator in Table 1 of Brett's article. I'd be happy to run this. However, Maybe the strengths of coverage-based methods is in tumour/normal and multisample analysis rather than in single sample WGS, and here we should just use manta/lumpy/wham? Sergey |
Sergey -- thanks so much for the help with this, those plans sound perfect. My take on the calling method are that both depth based and split/paired methods are useful can capture different events, hence are complementary. The NA12878 example above uses a lot of short read split/paired methods as inputs so will miss some depth-only events and isn't meant to be fully comprehensive. In general, short reads can only capture a component of all the events. Genome in a Bottle has been working on a new truth set with NA24385 and more long read technologies and this is a comparison of methods like manta which tries to quantify this: https://github.com/bcbio/bcbio_validations/tree/master/NA24385_sv Having a germline CNV specific validation based on Brett's work would be an incredible help to compare and improve methods in bcbio. I've been working on a tumor/normal CNV one based on HCC2218 (https://github.com/Illumina/canvas#demo-tumor-normal-enrichment-workflow) and generally working on making common sets of validations available (https://github.com/bcbio/bcbio_validation_workflows). Thanks again for this great discussion. |
A few more comments: (1) Re: "do you think it is worth using read-depths CNV calling methods for single sample WGS, given that the best methods (ERDS and CNVkit) discover 51% and 44% events in the best case (filtered), having a lot of false discoveries (59% and 32%) (table 1 in Brett's article), compared to split/paired methods with 80-90% events discovered and 20-40% of false positives (see my picture above)." - I don't think it is very meaningful to compare these numbers. The benchmarks used in the CNV paper were mostly orthogonal to the methods being tested, whereas the GIAB set (as you mention) is biased toward SR/PEM methods. (2) I agree that the read depth callers and the SR/PEM callers are complementary, so it would be good to have both represented in bcbio. In particular, the read depth methods are better at caller larger variants, which are more likely to be clinically relevant. (3) ERDS, as with CNVnator, is no longer maintained. (4) Another good source of benchmarks may be a recent paper by Charles Lee's group, currently on BioRxiv: https://www.biorxiv.org/content/early/2017/09/23/193144 The authors kindly provided me with PacBio benchmarks for the children of the three trios described in this paper. Cheers, Brett |
Hello Brad, Brett, and Eric! I’ve validated deletions using HuRef (Venter) dataset from Brett’s 2018 article (the updated version including PacBio calls): 7,952 deletions after bedtools merge, sizes: [50 – 171,021,482]. _evaluate_one function in structural/validate.py uses pybedtool’s intersect to validate intervals. By default the overlap reported if there is min 1bp overlap (the equivalent of Surprisingly, in my test it does not make a big difference, just 15 deletions out of 2668 in GIAB, and it seems that pybedtools does not support parameters -f and -F, so we continue with the default intersect method. I run bcbio sv detection using WGS of HuRef (Venter). For each caller I selected deletions which passed filters (except of CNVkit) and merged the overlapping ones with.
I validated deletions with We see that in a more accurate HuRef benchmark, which is not biased by read-split SV calling methods, the best results are from MetaSV with less than 60% of events detected in the best case. CNVKit as a read-depth method performs better for events longer than 10000, and calls 50% of them. So we have a lot of room for improvement in the ‘one genome’ use case. @etal, do you think that there is a way to push up the performance of cnvkit for this use case (not for tumor-normal). If so, please provide instruction on how to run it. @bjtrost , could you please comment on the comparison of Table 1 from your 2018 article with these results? (Because you have >1kb filter, low complexity filter, 1+, 2+, it is a bit tricky to compare). Do you think that in its league (>10k) CNVkit plays similarly (50%) to ERDS and CNVnator, better, or worse? @chapmanb, do you have any ideas which tools I can test with the benchmark to potentially include in bcbio for 'one genome' use case? Sergey |
Sergey; https://github.com/bcbio/bcbio_validations/tree/master/NA24385_sv#smoove-validation This swaps over to use SURVIVOR for merging and validation to avoid needing custom bedtools intersections: I'd like to add back in stratifications as well since that is really useful. Are you able to share the HuRef benchmark and truth set? I'd love to incorporate this as an additional automated validation so we can continue to iterate and improve on this. I'd definitely welcome ideas from Eric to improve CNVkit for this case and we can continue to improve sensitivity and specificity for the cases it should be able to tackle. Thank you again for this great discussion and work. |
Thanks for sharing, @naumenko-sa, I'm happy to help tune CNVkit to perform better on single-sample germline WGS cases like this. It's especially helpful if you can point to other read-depth-based tools that perform well on this data -- CNVkit breaks down copy number analysis into discrete steps that can each be swapped with better methods as they're discovered. In the case of CNVnator, the main difference is the segmentation algorithm, mean-shift versus CNVkit's default circular binary segmentation (CBS). The normalization step is not much different since there's no matched control or pool of control samples. ERDS uses SNP b-allele frequencies to improve its calls, whereas CNVkit's support for that is incomplete so far (etal/cnvkit#34). In the latest version of CNVkit there are a couple of segmentation methods worth trying:
In a couple months I might be able to point to some new functionality in CNVkit to perform specific tests for hemizygous and homozygous losses in germline samples. I'll be sure to let this group know when that lands. |
Brad, In the meanwhile, I have added erds/cnvnator results from @bjtrost's article to the validation picture. Read-depth methods (erds, cnvnator) start performing well at 1000bp, cnvkit at 10000. Overall (regardless of the deletion size) So, I'm satisfied with metasv, and I think that using erds and cnvnator will not boost the sensitivity to 100%. However, you may see that the overall precision/sensitivity in HuRef benchmark is much lower than in GIAB benchmark, so there is a room for improvement. @etal , I think it would be crucial to improve cnvkit's performance in the [1000,10000] bin, where erds and cnvnator perform much better. I'm using cnvkit from bcbio/conda: Sergey |
@etal sorry, in my hands -f hmm-germline shows the same results as the default segmentation method. Flasso exceeded the running time of 2 days with 7 threads, unfortunately, it is too slow for the practical use in WGS. |
@naumenko-sa Thanks for the details. It's suspicious that the 'hmm-germline' results were identical to the default 'cbs' method; do you think CNVkit may have run 'cbs' in both cases due to some bug? For the 'flasso' and 'segmetrics' runtimes, it sounds like there's a bottleneck in the code that is worst on WGS data, specifically when a lot of segments are produced. WGS and exomes seem to require tighte significance thresholds ( I'll profile the code to see if there's a hotspot shared by |
Quarterly update: The I'll see if I can run these benchmarks myself and troubleshoot the |
Hi @etal ! Thanks for the update! I will try to run the test with the latest version. Unfortunately, you will not be able to run this, as you don't have input WGS data. The owners have not released it yet, I hope it will happen at some point, but for now I should it by myself. But I have permission to share results :) |
Hi everyone! Sorry for the long silence. Good news - HuRef WGS fastq just has been published: Some thoughts:
Sergey |
Sergey; |
Hi Brad! I've traced (using bcbio-nextgen-commands.log) what happened when running the below config file (a fragment):
Roughly: Smoove helps lumpy to make less noisy calls, and duphold adds coverage information to vcf called by lumpy and manta (break point algorithms). It looks like regions.bed affects only cnvkit, i.e. the other 3 tools are doing calls genome-wide. Would it make sense to restrict calling in manta/lumpy/wham by sv_variants regions too? If so, what would be the best place to add this?
Sergey |
Sergey; |
Hello! I revisited SV validations in a year, adding Delly. See the results here: I also created a PR: Sergey |
Sergey; |
Hello, Brad!
Thanks for the great pipeline!
Currently among read-depths CNV callers only CNVkit is supported in bcbio. In our experience ERDS and CNVNator perform better than CNVkit. What would be your thoughts to include them? We can put some developing efforts from our side.
Thanks!
Sergey
The text was updated successfully, but these errors were encountered: