In statistics, the multiple testing problem occurs when testing multiple hypotheses
simultaneously.
As the number of tests increases, so does the probability of encountering a type I error
(false positive).
With as little as 20 tests, the chance of finding a significant result is ~64% although
no tests may actually be significant [Goldman2008].
adjust
can be used to correct for multiple testing by controlling the family-wise
error rate (FWER) or the false discovery rate (FDR).
It is designed to be simple to use and relatively fast, e.g., controlling the FDR for a
set of GWAS results--13,549,588 tests (1.4GB)--takes about a minute and a half
(on a Xeon E5-2640 @ 2.50GHz, running on a single thread).
$ adjust [options] <input> <output>
adjust
operates over delimiter separated value (DSV) files.
If the file has a header, adjust
will attempt to infer which column contains the
p-value.
For example, the following command can be used to control the FDR at alpha < 0.05 using
the Benjamini-Hochberg step-up procedure.
All rows that don't meet this criteria are removed:
$ adjust input-stats.tsv output-stats.tsv
Or, if you'd rather use the Bonferroni correction to control the FWER:
$ adjust -b input-stats.tsv output-stats.tsv
It can also read from stdin and write to stdout if the command is just part of a larger pipeline or processing step:
$ cat input-stats.tsv | adjust -i -o > output-stats.tsv
--adjust
: Convert and replace p-values with adjusted p-values-a, --alpha=NUM
: Set the alpha (default = 0.05)--fdr
: Control the FDR using Benjamini-Hochberg step-up procedure--fwer
: Control the FWER using the Bonferroni correction-d, --delim=CHAR
: Specify a delimiter to use when parsing the input and writing output.-c, --column=INT
: Zero-indexed column containing the p-value (currently disabled)-n, --no-header
: Specify that the input does not contain a header file (currently disabled)-r, --remove
: Remove rows above the given alpha threshold. This is only relevant when producing adjusted p-values using the--adjust
option.-i, --stdin
: Read from stdin instead of a file-o, --stdout
: Write to stdout instead of a file
Compilation and installation is done with Stack. Setup GHC:
$ stack setup
Build the application:
$ stack build
If you wish to install it to your $PATH
:
$ stack build --copy-bins
- GHC >= 8.2.2
- Stack
[Goldman2008] | https://www.stat.berkeley.edu/~mgoldman/Section0402.pdf |