Removed pairwise batch effect checks from WDL workflow #617

shadizaheri · 2023-11-09T20:13:22Z

Description:

In this PR, I made significant changes to the WDL workflow responsible for analyzing batch effects in genomic data. Our primary objective was to simplify and optimize the pipeline by removing the pairwise batch effect checks.

Changes Made:

Removed Pairwise Batch Effect Checks: Previously, the workflow considered two types of batch effects: one that compared a single batch against all other batches ("1 vs. all") and another that performed pairwise comparisons of each batch against every other batch. We decided to focus solely on the "1 vs. all" checks and remove the pairwise comparisons to streamline the analysis.
Cleaned Up Redundant Tasks: As part of this update, we also removed tasks and scatter operations related specifically to the pairwise checks, further simplifying the code.

A high-level overview of the changes I implemented, highlighting the differences between this version and eph_turn_off_unstable_af_filter branch:

I removed MakeBatchPairsList Task: This task generates a list of batch pairs for pairwise comparison. Since I don't want pairwise comparisons, I removed this task.
I removed the Scatter Blocks for Pairwise Comparisons: The scatter block that calls helper.check_batch_effects for each pair in batch_pairs were removed.
I removed MergeVariantFailureLists Call for Pairwise Checks: The call to MergeVariantFailureLists as merge_pairwise_checks that collects results from pairwise batch effect detection was removed.
I adjusted the MakeReclassificationTable Task: This task takes inputs from both pairwise and one-vs-all checks. Since I removed the pairwise checks, I also removed the pairwise_fails input and any related logic inside the task that deals with pairwise comparison results.
I removed any references to Pairwise outputs: Any output or logic that depends on the results of the pairwise comparisons was removed.
I modified the make_batch_effect_reclassification_table.PCRMinus_only.R script to reflect these changes.

Rationale:

Our rationale for these changes was multifaceted:

Efficiency: Pairwise checks can be computationally intensive, especially when dealing with a large number of batches. By focusing on the "1 vs. all" approach, we can get a broader view of batch effects without the overhead of numerous pairwise comparisons.
Simplicity: Reducing the complexity of the workflow makes it easier to understand, maintain, and troubleshoot.
Focus on Broad Effects: The "1 vs. all" checks provide a holistic view of how a particular batch compares to the general trend across all batches. This approach can highlight more substantial, systemic batch effects rather than the nuanced differences between individual batches.

By implementing these changes, we aim to provide a more streamlined, efficient, and intuitive workflow for analyzing batch effects.

Tests
I have tested on the recent updates using the batch list and datasets located in the Phase 1 workspace. For those who have the phase 1 AoU permissions, the test results can be viewed at the following Job Manager Results.

…tch_effect

# Just use onevsall.fails directly. merged <- onevsall.fails

adjust tab spaces

analyze.failures ----> categorize.failures

epiercehoffman

Thanks for working on this, Shadi!

My big question is: Is there ever a situation where we would still want to perform pairwise comparisons? I know we aren't able to at the scale of AoU, but are there smaller projects where we would want to retain this capability? I don't know the answer, but if so, then it would be best to be able to toggle those comparisons on/off instead of removing them entirely. We could still use this version for AoU but I want to get this question answered before merging into main.

I've made some style comments. Also make sure to delete src/sv-pipeline/scripts/downstream_analysis_and_filtering/.Rhistory - looks like that is an extra file that snuck in unintentionally.

epiercehoffman · 2023-12-08T16:47:31Z

wdl/Module07XfBatchEffect.wdl

-      --info ALL \
-      --no-samples \
-      ~{vcf} "~{prefix}.vcf2bed.bed"
+    --info ALL \


I appreciate you taking this opportunity to clean up some of the WDL indentation. However, the indentation in these bash sections is for readability - ie. these lines are indented because they are a continuation of the svtk command, and lines 252-257 are indented further because they are a continuation of the command substitution. Let's keep the bash indentation as it was to help future readers

epiercehoffman · 2023-12-08T16:50:25Z

...s/downstream_analysis_and_filtering/make_batch_effect_reclassification_table.PCRMinus_only.R

@@ -118,13 +91,13 @@ categorize.failures <- function(dat,pairwise.cutoff,onevsall.cutoff){
 ###Read command-line arguments
 args <- commandArgs(trailingOnly=T)
 freq.table.in <- as.character(args[1])
-pairwise.in <- as.character(args[2])
+#pairwise.in <- as.character(args[2])


If we want to fully remove pairwise checks it would be cleaner to delete these lines rather than commenting them out

epiercehoffman · 2023-12-08T16:51:34Z

...s/downstream_analysis_and_filtering/make_batch_effect_reclassification_table.PCRMinus_only.R

-merged <- merge(pairwise.fails,onevsall.fails,all=T,sort=F,by="VID")
-if(nrow(merged) > 0){
-  merged[,-1] <- apply(merged[,-1],2,function(vals){
+#merged <- merge(onevsall.fails,all=T,sort=F,by="VID")


Suggested change

#merged <- merge(onevsall.fails,all=T,sort=F,by="VID")

epiercehoffman and others added 12 commits January 6, 2023 15:58

add toggle for unstable af filter during label step in batch effect

43a93f1

skip AF comparison and filtering if no input provided

38e11d6

make af input actually optional

682ca8f

Removed pairwise batch effect checks from WDL workflow

37662c8

no_pairwise_no_unstable_af_filter

b691bd6

Merge branch 'eph_turn_off_unstable_af_filter' into sz_no_pairwise_ba…

a97bc39

…tch_effect

updated the related wdl and R script

592d93a

updated the related wdl and deleted the copy of the wdl

3533181

Update make_batch_effect_reclassification_table.PCRMinus_only.R

baba8aa

# Just use onevsall.fails directly. merged <- onevsall.fails

Update Module07XfBatchEffect.wdl

b253052

adjust tab spaces

Update Module07XfBatchEffect.wdl

54c421e

Update make_batch_effect_reclassification_table.PCRMinus_only.R

30647d3

analyze.failures ----> categorize.failures

shadizaheri marked this pull request as ready for review December 5, 2023 00:03

shadizaheri requested a review from mwalker174 December 5, 2023 00:05

epiercehoffman reviewed Dec 8, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Removed pairwise batch effect checks from WDL workflow #617

Removed pairwise batch effect checks from WDL workflow #617

shadizaheri commented Nov 9, 2023 •

edited

Loading

epiercehoffman left a comment

epiercehoffman Dec 8, 2023

epiercehoffman Dec 8, 2023

epiercehoffman Dec 8, 2023

Removed pairwise batch effect checks from WDL workflow #617

Are you sure you want to change the base?

Removed pairwise batch effect checks from WDL workflow #617

Conversation

shadizaheri commented Nov 9, 2023 • edited Loading

Changes Made:

Rationale:

epiercehoffman left a comment

Choose a reason for hiding this comment

epiercehoffman Dec 8, 2023

Choose a reason for hiding this comment

epiercehoffman Dec 8, 2023

Choose a reason for hiding this comment

epiercehoffman Dec 8, 2023

Choose a reason for hiding this comment

shadizaheri commented Nov 9, 2023 •

edited

Loading