-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removed pairwise batch effect checks from WDL workflow #617
base: main
Are you sure you want to change the base?
Conversation
# Just use onevsall.fails directly. merged <- onevsall.fails
adjust tab spaces
analyze.failures ----> categorize.failures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this, Shadi!
My big question is: Is there ever a situation where we would still want to perform pairwise comparisons? I know we aren't able to at the scale of AoU, but are there smaller projects where we would want to retain this capability? I don't know the answer, but if so, then it would be best to be able to toggle those comparisons on/off instead of removing them entirely. We could still use this version for AoU but I want to get this question answered before merging into main.
I've made some style comments. Also make sure to delete src/sv-pipeline/scripts/downstream_analysis_and_filtering/.Rhistory
- looks like that is an extra file that snuck in unintentionally.
--info ALL \ | ||
--no-samples \ | ||
~{vcf} "~{prefix}.vcf2bed.bed" | ||
--info ALL \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate you taking this opportunity to clean up some of the WDL indentation. However, the indentation in these bash sections is for readability - ie. these lines are indented because they are a continuation of the svtk command, and lines 252-257 are indented further because they are a continuation of the command substitution. Let's keep the bash indentation as it was to help future readers
@@ -118,13 +91,13 @@ categorize.failures <- function(dat,pairwise.cutoff,onevsall.cutoff){ | |||
###Read command-line arguments | |||
args <- commandArgs(trailingOnly=T) | |||
freq.table.in <- as.character(args[1]) | |||
pairwise.in <- as.character(args[2]) | |||
#pairwise.in <- as.character(args[2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to fully remove pairwise checks it would be cleaner to delete these lines rather than commenting them out
merged <- merge(pairwise.fails,onevsall.fails,all=T,sort=F,by="VID") | ||
if(nrow(merged) > 0){ | ||
merged[,-1] <- apply(merged[,-1],2,function(vals){ | ||
#merged <- merge(onevsall.fails,all=T,sort=F,by="VID") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#merged <- merge(onevsall.fails,all=T,sort=F,by="VID") |
Description:
In this PR, I made significant changes to the WDL workflow responsible for analyzing batch effects in genomic data. Our primary objective was to simplify and optimize the pipeline by removing the pairwise batch effect checks.
Changes Made:
A high-level overview of the changes I implemented, highlighting the differences between this version and eph_turn_off_unstable_af_filter branch:
I removed
MakeBatchPairsList
Task: This task generates a list of batch pairs for pairwise comparison. Since I don't want pairwise comparisons, I removed this task.I removed the
Scatter
Blocks for Pairwise Comparisons: The scatter block that calls helper.check_batch_effects for each pair in batch_pairs were removed.I removed
MergeVariantFailureLists
Call for Pairwise Checks: The call to MergeVariantFailureLists as merge_pairwise_checks that collects results from pairwise batch effect detection was removed.I adjusted the
MakeReclassificationTable
Task: This task takes inputs from both pairwise and one-vs-all checks. Since I removed the pairwise checks, I also removed thepairwise_fails
input and any related logic inside the task that deals with pairwise comparison results.I removed any references to Pairwise outputs: Any output or logic that depends on the results of the pairwise comparisons was removed.
I modified the make_batch_effect_reclassification_table.PCRMinus_only.R script to reflect these changes.
Rationale:
Our rationale for these changes was multifaceted:
By implementing these changes, we aim to provide a more streamlined, efficient, and intuitive workflow for analyzing batch effects.
Tests
I have tested on the recent updates using the batch list and datasets located in the Phase 1 workspace. For those who have the phase 1 AoU permissions, the test results can be viewed at the following Job Manager Results.