-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BENCHMARK] Run parameter benchmarks with different input data and update them. #207
[BENCHMARK] Run parameter benchmarks with different input data and update them. #207
Conversation
Signed-off-by: Lydia Buntrock <[email protected]>
Signed-off-by: Lydia Buntrock <[email protected]>
[TEST] Update tests Signed-off-by: Lydia Buntrock <[email protected]>
Signed-off-by: Lydia Buntrock <[email protected]>
Signed-off-by: Lydia Buntrock <[email protected]>
Signed-off-by: Lydia Buntrock <[email protected]>
Codecov Report
@@ Coverage Diff @@
## master #207 +/- ##
=======================================
Coverage 98.35% 98.35%
=======================================
Files 18 18
Lines 850 850
=======================================
Hits 836 836
Misses 14 14 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! 😄
Two ideas for the code:
min_qual=list(range(config["quality_ranges"]["iGenVar"]["from"], | ||
config["quality_ranges"]["iGenVar"]["to"], | ||
config["quality_ranges"]["iGenVar"]["step"]))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copied for in each input – is there a way to pre-define this?
dataset=["Illumina_Paired_End", "Illumina_Mate_Pair", "MtSinai_PacBio", "PacBio_CCS", "10X_Genomics"], | ||
parameter_name="partition_max_distance"), | ||
expand("results/parameter_benchmarks/{dataset}/plots/{parameter_name}.results.all.png", | ||
dataset=["Illumina_Paired_End", "Illumina_Mate_Pair", "MtSinai_PacBio", "PacBio_CCS", "10X_Genomics"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the dataset is also always the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, this appears in every benchmark workflow. So I think I will try to make it shorter in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! How did you come up with the new values?
With exactly this PR. So far I have tested iGenvar only on one dataset regarding the parameters, now I have added others. And then I changed only one parameter at a time and saw how our result behaves. With some back and forth I have now ended up with the dafault values and plots created here. On the plots you can see that larger or smaller values would not give any improvement. |
min_var_length = 30 (remains)
max_var_length = 10.000 (changed from 100.000)
max_tol_inserted_length = 50 (remains)
max_tol_deleted_length = 50 (remains)
max_overlap = 50 (changed from 10)
partition_max_distance = 50 (changed from 1.000)
hierarchical_clustering_cutoff = 0.3 (changed from 0.5)