The harmonizer module follows the improvement step generating recommendations to bring the performance of all input pipelines closer while maintaining the best possible performance. To achieve this, the degree of heterogeneity between pipelines is evaluated based on two values:
To quantitatively scale the heterogeneity in performance metrics we created the heterogeneity score (
To measure functional impact heterogeneity in variant discovery we created the Gene Discordance Ratio (GDR). The GDR is calculated as the complement of the proportion between the variant-affected genes found by every pipeline (intersection) over the total number of variant-affected genes even if only detected by one of the pipelines (union). Hence, a GDR value closer to 1 would imply a high level of heterogeneity in functional impact between the pipelines. This metric follows \Cref{eq:gdr} where
We recommend using the ONCOLINER container or the provided Dockerfile/Singularity recipe for building the whole ONCOLINER suite to avoid dependency issues.
The main executable code is in the src/
folder. The executable file is harmonization_main.py
.
usage: harmonization_main.py [-h] -i INPUT_PIPELINES_IMPROVEMENTS
[INPUT_PIPELINES_IMPROVEMENTS ...] -o OUTPUT
[-lm LOSS_MARGIN] [-mr MAX_RECOMMENDATIONS]
[-t THREADS]
ONCOLINER Harmonization
optional arguments:
-h, --help show this help message and exit
-i INPUT_PIPELINES_IMPROVEMENTS [INPUT_PIPELINES_IMPROVEMENTS ...], --input-pipelines-improvements INPUT_PIPELINES_IMPROVEMENTS [INPUT_PIPELINES_IMPROVEMENTS ...]
Paths to each pipeline improvement folder
-o OUTPUT, --output OUTPUT
Output folder
-lm LOSS_MARGIN, --loss-margin LOSS_MARGIN
Maximum performance loss from the maximum in a metric
to consider a recommendation (default: 0.05). A value
of 0.05 means that a recommendation will be provided
if the performance loss (in any metric) is less than
5% over the maximum of all recommendations. Decreasing
this value will decrease the number of recommendations
after --max-recommendations is applied
-mr MAX_RECOMMENDATIONS, --max-recommendations MAX_RECOMMENDATIONS
Maximun number of recommendations to provide for each
performance metric per variant type and size and
number of variant callers added (default: 1). Set to
-1 to provide all recommendations
-t THREADS, --threads THREADS
Number of CPU threads
The output folder will contain all the pipeline's harmonization possibilities, grouped in different files by variant type and size.