FeatureRequest: Support for Smoove/Lumpy population SV calling #2652

WimSpee · 2019-01-28T10:54:44Z

Hi,

Would it be possible to add Smoove/Lumpy population SV calling to Bcbio?
Bcbio currently support Smoove/Lumpy SV calling for small (n < ~ 40) sets of samples.
https://github.com/bcbio/bcbio-nextgen/blob/2e4c888b4c092572961d30d5f2f5068f7387e043/bcbio/structural/lumpy.py#27

For more than 40 samples the Smoove github documentation recommends to run Smoove in a 2 level map reduce way:
https://github.com/brentp/smoove (section population calling )

Single sample SV calling
Concat, sort and merge the single sample SV calling results
Single sample genotyping of all the merged SV's
Paste the single sample results to a square multi-sample table

This should scale up to thousands of WGrS samples I gather from an issue on the Smoove github page. I hope the sensitivity ans specificity is also still good compared to joint SV variant calling and genotyping.

Thank you.

chapmanb · 2019-01-30T15:49:02Z

Wim;
Thanks much for the suggestion. This is definitely something we'd like to work on for bigger sample runs but haven't had a chance to implement as it will take some restructuring. One issue is that I'm not sure how best to validate and determine the utility of joint versus single sample (or small related batch) calling. Thinking practically, an alternative we can do right now is only to group related samples during SV calling. Do you have any datasets where we could determine how considering a larger population helps with sensitivity? Thanks again for the discussion.

WimSpee · 2019-01-30T16:50:30Z

One (major) upside of (often) (re-)doing the Smoove&Lumpy population SV calling (from single sample Lumpy VCF) is that is results in a (up-to-date) square SV table for the expanding sets of samples that we work with.

Just merging the Smoove&Lumpy batch SV VCF files would results in a non-square (i.e. Swiss cheese) SV table. Like described here also for small variants under batch analysis:
https://gatkforums.broadinstitute.org/gatk/discussion/4150/should-i-analyze-my-samples-alone-or-together
As far as I know there is no way to get a square population SV table from multiple batch SV tables.
So one major way in which the population SV calling helps is in that it is possible to get a square SV table for multiple hundred to multiple thousand of samples. We just tried this for a first set of few hundred samples. Within a few hours we had a square table of SVs for a few hundred of samples. And some SVs of interest are present and genotyped over all samples. This was using also a few hundred CPU, 1 CPU per sample, for both the SV calling and SV genotyping step.

We don't have much public 'truth' data that we can use for testing. I am curious about the sensitivity and specificity of the population calling versus all together at once mode. Also it makes more sense to me do this validation in Human, since there probably is more 'truth' data, and the validation has more value for a bigger set of the bcbio users.

We managed for now to run Smoove&Lumpy outside of bcbio.
From an efficiency point of view it makes sense to re-analyze all samples at the same time via bcbio for small variants and SV's (small variants=GATK4 starting from GVCF, SV=Smoove&Lumpy starting from existing Lumpy single sample VCF)

WimSpee mentioned this issue Feb 5, 2019

bcbio 1.1.3: Update Smoove from 0.1.9 to 0.2.3 #2666

Closed

roryk added the enhancement label Aug 10, 2019

naumenko-sa mentioned this issue May 29, 2020

bcbio priorities #3242

Open

90 tasks

naumenko-sa closed this as completed May 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FeatureRequest: Support for Smoove/Lumpy population SV calling #2652

FeatureRequest: Support for Smoove/Lumpy population SV calling #2652

WimSpee commented Jan 28, 2019

chapmanb commented Jan 30, 2019

WimSpee commented Jan 30, 2019

FeatureRequest: Support for Smoove/Lumpy population SV calling #2652

FeatureRequest: Support for Smoove/Lumpy population SV calling #2652

Comments

WimSpee commented Jan 28, 2019

chapmanb commented Jan 30, 2019

WimSpee commented Jan 30, 2019