Deltadeltacq 53 (#62)

* fix value_name in calculate_deltacq.R * 96-well delta cq vignette draft * simplify calculate_deltacq.R * clarifications to ddcq vignette * calculate_deltadeltacq_bytargetid runs * Deltadeltacq 53 (#58) * Fixes: #56 all variables are now explicitly defined * Fixed bug in deltadeltaqc function so that sample_id rather than target_id is passed onto calculate_normvalue function * calculate_deltacq documentation clarifications * explanations in deltacq_96well_vignette.Rmd * delta cq and vignette updates on README.md * multiple ref genes for deltacq, addresses #52 * Comments on delta Cq vignette Hi there, I had a few minutes so I went through your new vignette. It's really good, clear and easy to understand! I like the simpler (but still real) data set. I made a bunch of changes throughout, I hope you don't mind. My philosophy is to make as many comments and changes as possible and then let the author decide which ones are valuable. Please don't take the volume of comments/changes as a criticism! A few notes below about my comments: Summary section: tried to simplify the technical details of the experiment a bit to make it more approachable to non-microbiologists. Sorry if I got anything wrong! Throughout: shortened here and there by removing information I didn't think was critical I added comments in some places inside `[]` I mostly used "gene" instead of "target" because I think it's easier to understand, but qPCR users will probably be familiar with the word "target". Maybe just define it at the top. * responded to @seaaan's vignette edits * Comments on delta Cq vignette (#59) Hi there, I had a few minutes so I went through your new vignette. It's really good, clear and easy to understand! I like the simpler (but still real) data set. I made a bunch of changes throughout, I hope you don't mind. My philosophy is to make as many comments and changes as possible and then let the author decide which ones are valuable. Please don't take the volume of comments/changes as a criticism! A few notes below about my comments: Summary section: tried to simplify the technical details of the experiment a bit to make it more approachable to non-microbiologists. Sorry if I got anything wrong! Throughout: shortened here and there by removing information I didn't think was critical I added comments in some places inside `[]` I mostly used "gene" instead of "target" because I think it's easier to understand, but qPCR users will probably be familiar with the word "target". Maybe just define it at the top. * fixed col_types for doubles in read_lightcycler_1colour_cq * fixed decimal places of deltacq_96well select data Co-authored-by: Samuel Joseph Haynes <[email protected]> Co-authored-by: Sean Hughes <[email protected]>
ropensci · Sep 4, 2020 · cafce12 · cafce12
1 parent 350c5e2
commit cafce12
Show file tree

Hide file tree

Showing 15 changed files with 778 additions and 129 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -23,5 +23,5 @@ Suggests: knitr,
 VignetteBuilder: knitr
 License: Apache License 2.0
 LazyData: true
-RoxygenNote: 7.1.0
+RoxygenNote: 7.1.1
 Encoding: UTF-8
diff --git a/NAMESPACE b/NAMESPACE
@@ -1,11 +1,12 @@
 # Generated by roxygen2: do not edit by hand
 
 export(calculate_deltacq_bysampleid)
+export(calculate_deltadeltacq_bytargetid)
 export(calculate_drdt_plate)
 export(calculate_dydx_1)
 export(calculate_efficiency)
 export(calculate_efficiency_bytargetid)
-export(calculate_normcq)
+export(calculate_normvalue)
 export(create_blank_plate)
 export(create_blank_plate_1536well)
 export(create_blank_plate_96well)
@@ -20,7 +21,6 @@ export(getdRdTall)
 export(label_plate_rowcol)
 export(make_row_names_echo1536)
 export(make_row_names_lc1536)
-export(normalizeqPCR)
 export(read_lightcycler_1colour_cq)
 export(read_lightcycler_1colour_raw)
 export(scale_loglog10)

diff --git a/R/calculate_deltacq.R b/R/calculate_deltacq.R
@@ -1,85 +1,144 @@
 
-#' @describeIn calculate_deltacq_bysampleid get the median value of a set of
-#'   normalization (reference) probes, for a single sample.
+#' Calculate a normalized value for a subset of reference ids
 #'
-#' @param norm_function Function to use to calculate the value to
-#' normalise by on log2/cq scale.
-#' Default function is median, alternatively could use mean.
+#' This is used to calculate the normalized `cq` values for reference
+#' `target_ids` (e.g. genes), to use in `delta_cq` calculation for each
+#' `sample_id`.
+#'
+#' Also used to calculate the normalized `delta_cq` values for reference
+#' `sample_ids`, to use in `deltadelta_cq` calculation for each `target_id`.
+#'
+#' @param value_df data frame containing relevant columns, those named in
+#'   `value_name` and `id_name` parameters.
+#' @param ref_ids values of reference ids, that are used to calculate
+#'   normalized reference value.
+#' @param value_name name of column containing values. This column should be
+#'   numeric.
+#' @param id_name name of column containing ids.
+#' @param norm_function Function to use to calculate the value to normalize by.
+#'   Default function is median, alternatively could use mean, geometric mean,
+#'   etc.
 #'
 #' @export
 #' @importFrom tidyr %>%
 #' @importFrom stats median
-#'
-calculate_normcq <- function(cq_df,
-                      value_name = "cq",
-                      norm_target_ids = "ALG9",
-                      tid_name = "target_id",
+#'   
+calculate_normvalue <- function(value_df,
+                      ref_ids,
+                      value_name = "value",
+                      id_name = "id",
                       norm_function = median) {
-    # make subset of cq_df where gene is one or more norm_target_ids
-    value_to_norm_by <- dplyr::filter(cq_df,
-                             !!dplyr::sym(tid_name) %in% norm_target_ids) %>%
-        .[[value_name]] %>%
+    # make subset of value_df where gene is one or more ref_ids
+    value_to_norm_by <- dplyr::filter(value_df,
+                             !!dplyr::sym(id_name) %in% ref_ids) %>%
+        dplyr::pull(!!dplyr::sym(value_name)) %>%
         norm_function(na.rm = TRUE)
     #
-    # assign summary (median) value to cq_df$value_to_norm_by
+    # assign summary (median) value to value_df$value_to_norm_by
     # note this is the same value for every row, a waste of space technically
-    dplyr::mutate(cq_df, value_to_norm_by = value_to_norm_by)
+    dplyr::mutate(value_df, value_to_norm_by = value_to_norm_by)
 }
 
-#' Calculate delta cq to normalize quantification cycle (log2-fold) data within
-#' sample_id.
+#' Calculate delta cq (\eqn{\Delta Cq}) to normalize quantification cycle
+#' (log2-fold) data within sample_id.
+#'
+#' This function implements relative quantification by the delta Cq method. For
+#' each sample, the Cq values of all targets (e.g. genes, probes, primer sets)
+#' are compared to one or more reference target ids specified in
+#' `ref_target_ids`.
 #'
 #' @param cq_df a data frame containing columns `sample_id`, value_name (default
 #'   `cq`) and tid_name (default `target_id`). Crucially, sample_id should be
 #'   the same for different technical replicates measuring identical reactions
 #'   in different wells of the plate, but differ for different biological and
-#'   experimental replicates.
-#' @param value_name the column name of the value that will be normalized
-#' @param norm_target_ids names of PCR probes (or primer sets) to normalize by,
-#'   i.e. reference genes
-#' @param tid_name the column name for probe sets
+#'   experimental replicates. See tidyqpcr vignettes for examples.
+#' @param ref_target_ids names of targetss to normalize by, i.e. reference
+#'   genes, hydrolysis probes, or primer sets. This can be one reference target
+#'   id, a selection of multiple target ids, or even all measured target ids. In
+#'   the case of all of them, the delta Cq value would be calculated relative to
+#'   the median (or other `norm_function`) of all measured targets.
+#' @param norm_function Function to use to calculate the value to normalize by
+#'   on given scale. Default is median, alternatively could use mean.
 #'
 #' @return data frame like cq_df with three additional columns:
 #'
-#'   \tabular{ll}{ value_to_norm_by       \tab the median value of the reference
-#'   probes \cr value_norm    \tab the normalized value, \eqn{\Delta Cq} \cr
-#'   value_normexp \tab the normalized ratio, \eqn{2^(-\Delta Cq)} }
+#'   \tabular{ll}{ ref_cq    \tab summary (median/mean) cq value for reference
+#'   target ids \cr delta_cq  \tab normalized value, \eqn{\Delta Cq} \cr
+#'   rel_abund \tab normalized ratio, \eqn{2^(-\Delta Cq)} }
 #'
 #' @export
 #' @importFrom tidyr %>%
-#'
+#' @importFrom stats median
+#' @importFrom rlang .data
+#'   
 calculate_deltacq_bysampleid <- function(cq_df,
-                                         norm_target_ids,
-                                         value_name = "cq",
-                                         tid_name = "target_id") {
+                                         ref_target_ids,
+                                         norm_function = median) {
     cq_df %>%
-        dplyr::group_by(sample_id) %>%
-        dplyr::do(calculate_normcq(.,
-                                   value_name,
-                                   norm_target_ids,
-                                   tid_name)) %>%
+        dplyr::group_by(.data$sample_id) %>%
+        dplyr::do(calculate_normvalue(.data,
+                                   ref_ids = ref_target_ids,
+                                   value_name = "cq",
+                                   id_name = "target_id",
+                                   norm_function = norm_function)) %>%
+        dplyr::rename(ref_cq = .data$value_to_norm_by) %>%
         dplyr::ungroup() %>%
-        dplyr::mutate(.value = !!dplyr::sym(value_name), # a tidyeval trick
-               value_norm    = .value - value_to_norm_by,
-               value_normexp = 2^-value_norm) %>%
-        dplyr::select(-.value) %>%
+        dplyr::mutate(
+               delta_cq    = .data$cq - .data$ref_cq,
+               rel_abund   = 2^-.data$delta_cq) %>%
         return()
 }
 
-#' @describeIn calculate_deltacq_bysampleid Synonym for
-#'   calculate_deltacq_plates.
+
+#' Calculate delta delta cq (\eqn{\Delta \Delta Cq}) to globally normalize
+#' quantification cycle (log2-fold) data across sample_id.
 #'
-#' @export
+#' This function does a global normalization, where all samples are compared to
+#' one or more reference samples specified in `ref_sample_ids`. There are other
+#' experimental designs that require comparing samples in pairs or small groups,
+#' e.g. a time course comparing `delta_cq` values against a reference strain at
+#' each time point. For those situations, instead we recommend adapting code
+#' from this function, changing the grouping variables used in to
+#' `dplyr::group_by` to draw the contrasts appropriate for the experiment.
 #'
-normalizeqPCR <- function(cq_df,
-                          value_name = "cq",
-                          norm_target_ids = "ALG9",
-                          tid_name = "target_id") {
-    lifecycle::deprecate_warn("0.2", "normalizeqPCR()",
-                              "calculate_deltacq_bysampleid()",
-        details = "Replaced with more descriptive name")
-    calculate_deltacq_bysampleid(cq_df = cq_df,
-                  norm_target_ids = norm_target_ids,
-                  value_name = value_name,
-                  tid_name = tid_name)
-}
+#' @param deltacq_df a data frame containing columns `sample_id`, value_name
+#'   (default `delta_cq`) and tid_name (default `target_id`). Crucially,
+#'   sample_id should be the same for different technical replicates measuring
+#'   identical reactions in different wells of the plate, but differ for
+#'   different biological and experimental replicates.
+#'
+#'   Usually this will be a data frame that was output by
+#'   `calculate_deltacq_bysampleid`.
+#'
+#' @param ref_sample_ids reference sample_ids to normalize by
+#' @param norm_function Function to use to calculate the value to normalize by
+#'   on given scale. Default is median, alternatively could use mean.
+#'
+#' @return data frame like cq_df with three additional columns:
+#'
+#'   \tabular{ll}{ ref_delta_cq  \tab summary (median/mean) \eqn{\Delta Cq}
+#'   value for target_id in reference sample ids \cr deltadelta_cq \tab the
+#'   normalized value, \eqn{\Delta \Delta Cq} \cr fold_change   \tab the
+#'   normalized fold-change ratio, \eqn{2^(-\Delta \Delta Cq)} }
+#'
+#' @export
+#' @importFrom tidyr %>%
+#' @importFrom stats median
+#'   
+calculate_deltadeltacq_bytargetid <- function(deltacq_df,
+                                         ref_sample_ids,
+                                         norm_function = median) {
+    deltacq_df %>%
+        dplyr::group_by(.data$target_id) %>%
+        dplyr::do(calculate_normvalue(.data,
+                                   ref_ids = ref_sample_ids,
+                                   value_name = "delta_cq",
+                                   id_name = "sample_id",
+                                   norm_function = norm_function)) %>%
+        dplyr::rename(ref_delta_cq = .data$value_to_norm_by) %>%
+        dplyr::ungroup() %>%
+        dplyr::mutate(
+               deltadelta_cq = .data$delta_cq - .data$ref_delta_cq,
+               fold_change   = 2^-.data$deltadelta_cq) %>%
+        return()
+}
diff --git a/R/calculate_efficiency.R b/R/calculate_efficiency.R
@@ -84,5 +84,5 @@ calculate_efficiency_bytargetid <- function(cq_df,
     }
     cq_df %>%
         dplyr::group_by(.data$target_id) %>%
-        dplyr::do(calculate_efficiency(., formula = formula))
+        dplyr::do(calculate_efficiency(.data, formula = formula))
 }
diff --git a/R/plate_functions.R b/R/plate_functions.R
@@ -31,7 +31,7 @@ create_blank_plate <- function(well_row = LETTERS[1:16], well_col = 1:24) {
     tidyr::crossing(well_row = as_factor(well_row),
                     well_col = as_factor(well_col)) %>%
         as_tibble() %>%
-        tidyr::unite(well, .data$well_row, .data$well_col, 
+        tidyr::unite("well", .data$well_row, .data$well_col, 
                      sep = "", remove = FALSE)
 }
 
@@ -347,11 +347,11 @@ display_plate <- function(plate) {
         levels()
     #
     ggplot2::ggplot(data = plate,
-                    aes(x = as_factor(.data$well_col),
+                    ggplot2::aes(x = as_factor(.data$well_col),
                         y = as_factor(.data$well_row))) +
-        ggplot2::geom_tile(aes(fill = .data$target_id), 
+        ggplot2::geom_tile(ggplot2::aes(fill = .data$target_id), 
                            alpha = 0.3) +
-        ggplot2::geom_text(aes(label = 
+        ggplot2::geom_text(ggplot2::aes(label = 
                                    paste(.data$target_id,
                                          .data$sample_id,
                                          .data$prep_type,

diff --git a/R/read_qpcr_data.R b/R/read_qpcr_data.R
@@ -79,7 +79,7 @@ read_lightcycler_1colour_cq <- function(
         "include", "color", "well", "sample_info",
         "cq", "concentration", "standard", "status"
     ), 
-    col_types = "liccnnil",
+    col_types = "liccddil",
     ...) {
     readr::read_tsv(file = filename,
                     skip = 2,

diff --git a/README.md b/README.md
@@ -48,31 +48,35 @@ We want to make it easier for scientists to produce reliable and interpretable r
 
 # Status
 
-As of April 2020, this software is in development. [Edward Wallace](https://github.com/ewallace) wrote basic functions and documentation needed to do qPCR analysis in [the Wallace lab](https://ewallace.github.io/), and is making them freely available. [Sam Haynes](https://github.com/dimmestp) is helping develop as part of the [eLife Open Innovation Leaders programme](https://elifesciences.org/labs/fdcb6588/innovation-leaders-2020-introducing-the-cohort). 
+As of August 2020, this software is in development. [Edward Wallace](https://github.com/ewallace) wrote basic functions and documentation needed to do qPCR analysis in [the Wallace lab](https://ewallace.github.io/), and is making them freely available. [Sam Haynes](https://github.com/dimmestp) is helping develop as part of the [eLife Open Innovation Leaders programme](https://elifesciences.org/labs/fdcb6588/innovation-leaders-2020-introducing-the-cohort). 
 
 ## News
 
+* August 2020, relative quantification (delta delta Cq) added with function `calculate_deltadeltacq_bytargetid`, and a vignette illustrationg this with small data from a 96-well plate.
 * June 2020, upgrades that break previous code. All function and variable names have been changed to snake case, i.e. lower case with underscore. Commits up to #ee6d192 change variable and function names. tidyqpcr now uses `sample_id` for nucleic acid sample (replaces Sample or SampleID), `target_id` for primer set/ probe (replaces TargetID or Probe), `prep_type` for nucleic acid preparation type (replaces Type), and `cq` for quantification cycle (replaces Cq or Ct). 
 It should be possible to upgrade old analysis code by (case-sensitive) search and replace. 
 Alternatively, pre-April 2020 analysis code should run from release v0.1-alpha, see [releases](https://github.com/ewallace/tidyqpcr/releases).
 
 
 # Features 
 
+Currently tidyqpcr has functions that support relative quantification, but not yet absolute quantification.
+
 ## Current features
 
 * every object is a tibble / data frame, no special data classes to learn
 * lay out and display 96/384-well plates for easy experimental setup (`label_plate_rowcol`, `create_blank_plate`, ...)
+* flexible assignment of metadata to samples for visualisation with [ggplot2](https://ggplot2.tidyverse.org/) (see vignettes)
 * read-in Cq and raw data from Roche LightCycler machines with single-channel fluorescence (`read_lightcycler_1colour_cq`, `read_lightcycler_1colour_raw`)
-* calibration of primer sets including estimating efficiencies and visualization of curves (`calculate_efficiency`, )
+* calibration of primer sets including estimating efficiencies and visualization of curves (`calculate_efficiency`, and see vignettes)
 * visualization of amplification and melt curves (`calculate_drdt_plate`, and see vignettes)
-* normalization of Cq data to one or more reference probe sets by delta count method (`calculate_normcq`, `calculate_deltacq_bysampleid`)
-* flexible assignment of metadata to samples for visualisation with [ggplot2](https://ggplot2.tidyverse.org/) (see vignettes)
+* delta Cq: normalization/ relative quantification of Cq data to one or more reference targets by delta count method (`calculate_normcq`, `calculate_deltacq_bysampleid`)
+* delta delta Cq: normalization of delta Cq data across multiple samples (`calculate_deltadeltacq_bytargetid`)
 
 ## Future priorities
 
 * including primer efficiencies in quantification
-* an open-source and tested Cq calculation algorithm
+* an open-source and tested Cq calculation function, from amplification curves
 * multi-colour (hydrolysis probe) detection
 * extend to 1536-well plates 
 * metadata handling compatible with RDML format
@@ -111,7 +115,8 @@ library(tidyqpcr)
 The best place to start is the vignettes, which offer tutorials and example data analyses including figures. Currently there are 3 vignettes:
 
 * [IntroDesignPlatesetup](vignettes/platesetup_vignette.Rmd) - Introduction to designing an experiment and setting up a plate plan in tidyqpcr.
-* [MultifactorialExample](vignettes/multifactor_vignette.Rmd) - Example design and analysis of a (real) multifactorial qPCR experiment.
+* [DeltaCq96wellExample](vignettes/deltacq_96well_vignette.Rmd) - Example analysis of 96-well RT-qPCR data including relative quantification with delta Cq, from a real experiment.
+* [MultifactorialExample](vignettes/multifactor_vignette.Rmd) - Example design and analysis of a (real) multifactorial RT-qPCR experiment.
 * [PrimerCalibration](vignettes/calibration_vignette.Rmd) - Example design and analysis of calibrating qPCR primer sets from a (real) experimental test
 
 To find these from your R session, enter `browseVignettes(package="tidyqpcr")`. 
@@ -136,5 +141,5 @@ If you want to fix bugs or add features yourself, that's great. tidyqpcr develop
 * follow the [tidyverse style guide](https://style.tidyverse.org/).
 * document functions with [roxygen2](https://roxygen2.r-lib.org/), as described in [the R packages book](http://r-pkgs.had.co.nz/man.html).
 * check the package with `R CMD check` / `devtools::check()`, as explained in [the R packages book](http://r-pkgs.had.co.nz/check.html).
-* including, check that all the vignettes run
-* put in a pull request to the main repository, we will review
+* including, check that all the vignettes run.
+* put in a pull request to the main repository, we will review, then we will accept or suggest changes.