Flexible assignments of cells to treatment and control groups #163
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
The main goal of this pull request is to allow users to avoid filtering out cells with zero or more than two gRNAs, which is helpful in settings where the MOI is low but not that low. For example, cells with multiple NT gRNAs can safely still be considered control cells, or cells with one targeting gRNA and one or more NT gRNAs can safely still be considered treatment cells.
Updates to API
There is now an optional argument to
set_analysis_parameters()
calledtreatment_group
. The two options are:inclusive
: any cell containing a gRNA with a given target is a treatment cell (default in high MOI)exclusive
: only cells containing a gRNA with a given target but no other targeting gRNAs are treatment cells (default in low MOI)There is now an optional logical argument to
run_qc()
calledremove_cells_w_zero_or_twoplus_grnas
(defaulting toTRUE
for low MOI andFALSE
for high MOI). I have set the former default to FALSE for backward compatibility, but I recommend it is set toFALSE
in order to avoid throwing out cells unnecessarily.The data frames outputted by the calibration check, power check, and discovery analysis have two additional columns called
n_trt
andn_cntrl
, giving the number of cells used in the treatment and control groups, respectively.Limitation: The calibration check is not supported for
treatment_group = "inclusive"
,control_group = "nt_cells"
, andremove_cells_w_zero_or_twoplus_grnas = FALSE
. In this case, a proper calibration check could have cells labeled as undercover that include both the undercover NTs and targeting gRNAs. These cells are not a subset of the NT cells, and therefore break the assumption of the software that the undercover cells used in the calibration check are a subset of the NT cells (this assumption underlying the reindexing of theindiv_nt_grna_idxs
with respect toall_nt_idxs
, for example).Updates to back end
The helper function
process_initial_assignment_list()
now implements the logic of the treatment group by updatinggrna_group_idxs
accordingly. This function also calculatesall_nt_idxs
and adds it togrna_assignments_raw
. Before, onlygrna_assignments
had the fieldall_nt_idxs
. Now, it is no longer the case thatall_nt_idxs
can be constructed as the union ofindiv_nt_grna_idxs
. Fortreatment_group = "inclusive"
,indiv_nt_grna_idxs
can contain cells with targeting gRNAs butall_nt_idxs
by definition includes cells with no targeting gRNAs.Input checks were updated to reflect the changes to the API.
The helper function
update_indiv_grna_assignments_for_nt_cells()
was updated in order to remove the entries ofindiv_nt_grna_idxs
were nonoverlapping. Indeed, a cell containing two NT gRNAs would appear in the lists for both of these gRNAs.The helper function
add_num_cells_to_result()
was added in order to computen_trt
andn_cntrl
; it is now called inrun_calibration_check()
,run_power_check()
, andrun_discovery_analysis()
.The helper C++ function
compute_nt_nonzero_matrix_and_n_ok_pairs_v3()
was modified to correctly compute the QC metrics in the case when the entries ofindiv_nt_grna_idxs
were overlapping.Some
testthat
tests were added to test the new functionality. Another set of tests that needs to be manually run was added totests/manual/test-flexible-cell-assignments.R
, which ensures that running the new version ofsceptre
with defaults does not change the outputs (except for the addition ofn_trt
andn_cntrl
) on the low- and high-MOI example data, compared to the existing one. These tests needed to be outside thetestthat
framework because they involve running two different versions ofsceptre
.Limitations: The Nextflow pipeline, and any functions inside the R package pertaining to the Nextflow pipeline, were not updated.