Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BENCHMARK] dataset comparison igenvar only #201

Merged

Conversation

Irallia
Copy link
Collaborator

@Irallia Irallia commented Mar 31, 2022

UPDATE: look at the new plots below.

In this plot you can see the results of iGenVar with 2 short read and 3 long read sets and their combinations.

iGenVar_only-results all
Some of the example BAM files (all HG002) are aligned to different references:

    MtSinai_PacBio:     GCA_000001405.15_GRCh38_no_alt_analysis_set.fa
    PacBio_CCS_10kb:    hs37d5.fa
    10X_Genomics:       hg19.reordered.fa
    Illumina:           GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fa
    Illumina_Mate_Pair: GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fa

Since the truth set does not contain DUPs, I created another plot where all DUPs are interpreted as INS.

iGenVar_only-results DUP_as_INS all

The important question now is, are most sets really that bad, or does iGenVar simply not find SVs. This I want to find out in comparison with other callers.

@codecov
Copy link

codecov bot commented Mar 31, 2022

Codecov Report

Merging #201 (41fbde1) into master (e02d86f) will not change coverage.
The diff coverage is n/a.

❗ Current head 41fbde1 differs from pull request most recent head 9de92e5. Consider uploading reports for the commit 9de92e5 to get more accurate results

@@           Coverage Diff           @@
##           master     #201   +/-   ##
=======================================
  Coverage   98.35%   98.35%           
=======================================
  Files          18       18           
  Lines         850      850           
=======================================
  Hits          836      836           
  Misses         14       14           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e02d86f...9de92e5. Read the comment docs.

@Irallia Irallia force-pushed the TEST/benchmarks/dataset_comparison_igenvar_only branch 5 times, most recently from b77c70a to 4e6971d Compare April 1, 2022 09:21
@Irallia Irallia force-pushed the TEST/benchmarks/dataset_comparison_igenvar_only branch 2 times, most recently from 825406f to 4faafee Compare April 12, 2022 15:45
@Irallia Irallia force-pushed the TEST/benchmarks/dataset_comparison_igenvar_only branch 3 times, most recently from 41fbde1 to 24890f5 Compare April 21, 2022 10:07
@Irallia
Copy link
Collaborator Author

Irallia commented Apr 21, 2022

Updated plots (GRCh37)
iGenVar_only-results all
DUPs as INS
iGenVar_only-results DUP_as_INS all

@Irallia Irallia requested review from joergi-w and joshuak94 April 21, 2022 10:09
@joshuak94
Copy link
Collaborator

This is interesting! It makes sense that pacbio CCS reads are much easier to call SVs from: they have long read lengths and relatively high accuracy. However, it is a bit concerning that just illumina mate-pair reads result in such low accuracy. I'd be curious to see how this looks specifically with something like Deletions, since deletions can be detected somewhat robustly just via read-depth.

Maybe if we can find out if there is a specific variant which is bringing the whole curve down, we can better understand what the issue there is.

@Irallia Irallia force-pushed the TEST/benchmarks/dataset_comparison_igenvar_only branch from 24890f5 to 9de92e5 Compare April 22, 2022 08:51
@Irallia Irallia merged commit fe3f74c into seqan:master Apr 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants