[BENCHMARK] Run parameter benchmarks with different input data and update them. #207

Irallia · 2022-05-22T00:33:39Z

min_var_length = 30 (remains)

old value	new value

max_var_length = 10.000 (changed from 100.000)

old value	new value

max_tol_inserted_length = 50 (remains)

old value	new value

max_tol_deleted_length = 50 (remains)

old value	new value

max_overlap = 50 (changed from 10)

old value	new value

partition_max_distance = 50 (changed from 1.000)

old value	new value

hierarchical_clustering_cutoff = 0.3 (changed from 0.5)

old value	new value

Signed-off-by: Lydia Buntrock <[email protected]>

[TEST] Update tests Signed-off-by: Lydia Buntrock <[email protected]>

Signed-off-by: Lydia Buntrock <[email protected]>

codecov · 2022-05-22T00:38:50Z

Codecov Report

Merging #207 (c86937f) into master (fe3f74c) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #207   +/-   ##
=======================================
  Coverage   98.35%   98.35%           
=======================================
  Files          18       18           
  Lines         850      850           
=======================================
  Hits          836      836           
  Misses         14       14

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fe3f74c...c86937f. Read the comment docs.

Irallia · 2022-05-22T00:39:55Z

iGenVar only - new Plots:

old values	new values

joergi-w

Looks good! 😄
Two ideas for the code:

joergi-w · 2022-05-22T08:19:49Z

test/benchmark/parameter_benchmarks/Snakefile

+                min_qual=list(range(config["quality_ranges"]["iGenVar"]["from"],
+                                    config["quality_ranges"]["iGenVar"]["to"],
+                                    config["quality_ranges"]["iGenVar"]["step"])))


This is copied for in each input – is there a way to pre-define this?

joergi-w · 2022-05-22T08:21:48Z

test/benchmark/parameter_benchmarks/Snakefile

+                dataset=["Illumina_Paired_End", "Illumina_Mate_Pair", "MtSinai_PacBio", "PacBio_CCS", "10X_Genomics"],
+                parameter_name="partition_max_distance"),
+        expand("results/parameter_benchmarks/{dataset}/plots/{parameter_name}.results.all.png",
+                dataset=["Illumina_Paired_End", "Illumina_Mate_Pair", "MtSinai_PacBio", "PacBio_CCS", "10X_Genomics"],


the dataset is also always the same

Good point, this appears in every benchmark workflow. So I think I will try to make it shorter in another PR.

joshuak94

Looks good! How did you come up with the new values?

Irallia · 2022-05-24T10:58:43Z

Looks good! How did you come up with the new values?

With exactly this PR. So far I have tested iGenvar only on one dataset regarding the parameters, now I have added others. And then I changed only one parameter at a time and saw how our result behaves. With some back and forth I have now ended up with the dafault values and plots created here. On the plots you can see that larger or smaller values would not give any improvement.

Irallia added 6 commits May 18, 2022 15:07

[BENCHMARK] Run parameter benchmarks on different Input files

cfeec24

Signed-off-by: Lydia Buntrock <[email protected]>

[DOC] Add plots

091b0e3

Signed-off-by: Lydia Buntrock <[email protected]>

[MISC] Update default parameter

4a27999

[TEST] Update tests Signed-off-by: Lydia Buntrock <[email protected]>

[MISC] Use better parameter values for plots

138bb21

Signed-off-by: Lydia Buntrock <[email protected]>

[DOC] Updated Plots

9de8a86

Signed-off-by: Lydia Buntrock <[email protected]>

[DOC] Updated iGenVar only plots

c86937f

Signed-off-by: Lydia Buntrock <[email protected]>

Irallia requested review from joergi-w and joshuak94 May 22, 2022 00:40

Irallia self-assigned this May 22, 2022

joergi-w approved these changes May 22, 2022

View reviewed changes

joshuak94 approved these changes May 24, 2022

View reviewed changes

Irallia merged commit 71ad200 into seqan:master May 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BENCHMARK] Run parameter benchmarks with different input data and update them. #207

[BENCHMARK] Run parameter benchmarks with different input data and update them. #207

Irallia commented May 22, 2022 •

edited

Loading

codecov bot commented May 22, 2022

Irallia commented May 22, 2022

joergi-w left a comment

joergi-w May 22, 2022

joergi-w May 22, 2022

Irallia May 23, 2022

joshuak94 left a comment

Irallia commented May 24, 2022

[BENCHMARK] Run parameter benchmarks with different input data and update them. #207

[BENCHMARK] Run parameter benchmarks with different input data and update them. #207

Conversation

Irallia commented May 22, 2022 • edited Loading

min_var_length = 30 (remains)

max_var_length = 10.000 (changed from 100.000)

max_tol_inserted_length = 50 (remains)

max_tol_deleted_length = 50 (remains)

max_overlap = 50 (changed from 10)

partition_max_distance = 50 (changed from 1.000)

hierarchical_clustering_cutoff = 0.3 (changed from 0.5)

codecov bot commented May 22, 2022

Codecov Report

Irallia commented May 22, 2022

iGenVar only - new Plots:

joergi-w left a comment

Choose a reason for hiding this comment

joergi-w May 22, 2022

Choose a reason for hiding this comment

joergi-w May 22, 2022

Choose a reason for hiding this comment

Irallia May 23, 2022

Choose a reason for hiding this comment

joshuak94 left a comment

Choose a reason for hiding this comment

Irallia commented May 24, 2022

Irallia commented May 22, 2022 •

edited

Loading