[BENCHMARK] dataset comparison igenvar only #201

Irallia · 2022-03-31T11:52:35Z

UPDATE: look at the new plots below.

In this plot you can see the results of iGenVar with 2 short read and 3 long read sets and their combinations.

Some of the example BAM files (all HG002) are aligned to different references:

    MtSinai_PacBio:     GCA_000001405.15_GRCh38_no_alt_analysis_set.fa
    PacBio_CCS_10kb:    hs37d5.fa
    10X_Genomics:       hg19.reordered.fa
    Illumina:           GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fa
    Illumina_Mate_Pair: GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fa

Since the truth set does not contain DUPs, I created another plot where all DUPs are interpreted as INS.

The important question now is, are most sets really that bad, or does iGenVar simply not find SVs. This I want to find out in comparison with other callers.

codecov · 2022-03-31T11:56:34Z

Codecov Report

Merging #201 (41fbde1) into master (e02d86f) will not change coverage.
The diff coverage is n/a.

❗ Current head 41fbde1 differs from pull request most recent head 9de92e5. Consider uploading reports for the commit 9de92e5 to get more accurate results

@@           Coverage Diff           @@
##           master     #201   +/-   ##
=======================================
  Coverage   98.35%   98.35%           
=======================================
  Files          18       18           
  Lines         850      850           
=======================================
  Hits          836      836           
  Misses         14       14

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e02d86f...9de92e5. Read the comment docs.

test/benchmark/caller_comparison_iGenVar_only/workflow/rules/eval.smk

test/benchmark/caller_comparison_iGenVar_only/workflow/scripts/plot_all_results.R

Irallia · 2022-04-21T10:09:07Z

Updated plots (GRCh37)

DUPs as INS

joshuak94 · 2022-04-21T12:26:07Z

This is interesting! It makes sense that pacbio CCS reads are much easier to call SVs from: they have long read lengths and relatively high accuracy. However, it is a bit concerning that just illumina mate-pair reads result in such low accuracy. I'd be curious to see how this looks specifically with something like Deletions, since deletions can be detected somewhat robustly just via read-depth.

Maybe if we can find out if there is a specific variant which is bringing the whole curve down, we can better understand what the issue there is.

Signed-off-by: Lydia Buntrock <[email protected]>

… but with DUP as INS Signed-off-by: Lydia Buntrock <[email protected]>

Signed-off-by: Lydia Buntrock <[email protected]>

Irallia self-assigned this Mar 31, 2022

Irallia mentioned this pull request Mar 31, 2022

iGenVar - [BENCHMARK] Create short & long read benchmarks #197

Open

Irallia commented Mar 31, 2022

View reviewed changes

Irallia force-pushed the TEST/benchmarks/dataset_comparison_igenvar_only branch 5 times, most recently from b77c70a to 4e6971d Compare April 1, 2022 09:21

Irallia force-pushed the TEST/benchmarks/dataset_comparison_igenvar_only branch 2 times, most recently from 825406f to 4faafee Compare April 12, 2022 15:45

Irallia force-pushed the TEST/benchmarks/dataset_comparison_igenvar_only branch 3 times, most recently from 41fbde1 to 24890f5 Compare April 21, 2022 10:07

Irallia requested review from joergi-w and joshuak94 April 21, 2022 10:09

joergi-w approved these changes Apr 21, 2022

View reviewed changes

joshuak94 approved these changes Apr 21, 2022

View reviewed changes

Irallia added 5 commits April 22, 2022 10:51

[BENCHMARK] Compare all iGenVar input combinations.

a609bf1

Signed-off-by: Lydia Buntrock <[email protected]>

[DOC] Plot for Caller Comparison: iGenVar with different input files

6813ab3

Signed-off-by: Lydia Buntrock <[email protected]>

[BENCHMARK] Count DUP_as_INS and add new plot

125bf81

Signed-off-by: Lydia Buntrock <[email protected]>

[DOC] Plot for Caller Comparison: iGenVar with different input files,…

c078197

… but with DUP as INS Signed-off-by: Lydia Buntrock <[email protected]>

[DOC] Add results

9de92e5

Signed-off-by: Lydia Buntrock <[email protected]>

Irallia force-pushed the TEST/benchmarks/dataset_comparison_igenvar_only branch from 24890f5 to 9de92e5 Compare April 22, 2022 08:51

Irallia merged commit fe3f74c into seqan:master Apr 22, 2022

Irallia mentioned this pull request Apr 25, 2022

[BENCHMARKS] Examine the calling results more closely. #205

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BENCHMARK] dataset comparison igenvar only #201

[BENCHMARK] dataset comparison igenvar only #201

Irallia commented Mar 31, 2022 •

edited

Loading

codecov bot commented Mar 31, 2022 •

edited

Loading

Irallia commented Apr 21, 2022

joshuak94 commented Apr 21, 2022

[BENCHMARK] dataset comparison igenvar only #201

[BENCHMARK] dataset comparison igenvar only #201

Conversation

Irallia commented Mar 31, 2022 • edited Loading

In this plot you can see the results of iGenVar with 2 short read and 3 long read sets and their combinations.

Since the truth set does not contain DUPs, I created another plot where all DUPs are interpreted as INS.

The important question now is, are most sets really that bad, or does iGenVar simply not find SVs. This I want to find out in comparison with other callers.

codecov bot commented Mar 31, 2022 • edited Loading

Codecov Report

Irallia commented Apr 21, 2022

joshuak94 commented Apr 21, 2022

Irallia commented Mar 31, 2022 •

edited

Loading

codecov bot commented Mar 31, 2022 •

edited

Loading