Skip to content

sanssouci-org/sanssouci

Repository files navigation

sanssouci

R build status Coverage Status R-CMD-check

The goal of sanssouci [sãsusi] is to perform post hoc inference: in a multiple testing context, sanssouci provides statistical guarantees on possibly user-defined and/or data-driven sets of hypotheses.

Typical use cases include:

In both cases, the permutation-based post hoc inference methods implemented in the sanssouci package outperform classical post hoc bounds based on probabilistic inequalities.

Example: differential analysis

We briefly illustrate the case of differential gene expression study (see dedicated vignette for more information). We start by creating an object of class sanssouci from gene expression data available from the sanssouci.data package,

library("sanssouci")
#> remotes::install_github("sanssouci-org/sanssouci.data")
data(expr_ALL, package = "sanssouci.data")
groups <- ifelse(colnames(expr_ALL) == "BCR/ABL", 1, 0) # map to 0/1
obj <- SansSouci(Y = expr_ALL, groups = groups)

Then we fit the method by with B=1000 permutations, and with a target risk alpha = 0.1.

res <- fit(obj, alpha = 0.1, B = 1000)

Output 1 - Post hoc bound for a subset of genes

Let us assume that we are interested in genes with a $p$-value below $10^{-3}$.

S <- which(pValues(res) < 1e-3)
predict(res, S)
#>          TP         FDP 
#> 115.0000000   0.2857143

The method ensures with $1-\alpha = 90%$ confidence that there are at least 115 truly differentially expressed genes (true positives, TP) among these 161 genes, corresponding to a false discovery proportion (FDP) less than 0.29.

Output 2 - Confidence curves for “top-k” feature lists

Another output of the method is the following $1-\alpha$-level confidence curve on FDP (left panel) and corresponding TP (right panel) in “top-k” gene lists.

library("ggplot2")
plot(res, xmax = 500) +
  geom_vline(xintercept = length(S), 
             color = "gray", linetype = "dotted", size = 1.5) +
  geom_line(size = 1.5)
#> Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
#> ℹ Please use `linewidth` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.

Output 3 - Volcano plots

A more refined user selection is obtained by selecting genes based on both significance (here, a FDR-adjusted $p$-value below 0.05) and fold change (here, fold change above 0.3). Such selections are generally represented by volcano plots:

volcanoPlot(res, q = 0.05, r = 0.3, ylim = c(0, 6))

Importantly, multiple such selections can be made without compromising the validity of the bounds. The IIDEA shiny app makes it possible to perform such selections interactively and without programming.

Installation

You can install the development version from GitHub with:

# install.packages("pak")
pak::pkg_install("sanssouci-org/sanssouci")