The goal of jamba is to provide useful custom functions for R data analysis and visualization.
A full online function reference is available via the pkgdown documentation:
Functions are categorized, some examples are listed below:
Production will soon be available from CRAN:
install.packages("jamba")
The development version can be installed:
remotes::install_github("jmw86069/jamba")
crayon
- install withinstall.packages("crayon")
for glorious colored console output. Color makes it better.farver
- install withinstall.packages("farver")
for more efficient color manipulations, and HSL color coneversions.
Bioconductor packages are invaluable for bioinformatics work, but can be a bit “heavy” to install if not absolutely necessary. Therefore, Bioconductor packages are in “Enhances” so they require someone to make the choice to install them.
S4Vectors
- install withBiocManager::install("S4vectors")
to improve speed ofcPaste()
functions.openxlsx
- install withinstall.packages("openxlsx")
to support Excelxlsx
file import, and stylized export.kableExtra
- install withinstall.packages("kableExtra")
to enable colorized kable HTML tables in RMarkdown documents.ComplexHeatmap
- install withBiocManager::install("ComplexHeatmap")
to use withheatmap_row_order()
,cell_fun_label()
for custom labels.matrixStats
- install withinstall.packages("matrixStats")
for efficientnumeric
stats calculations, orsparseMatrixStats
for use with Matrix sparse matrices as used with Seurat and SingleCellExperiment data.ggridges
- install withinstall.packages("ggridges")
for convenient ridge density plots usingplotRidges()
.
The R functions in jamba
have been built up, used, tested, revised
over several years. They are immediately useful for day-to-day work, and
efficient and robust enough for production pipelines.
Many were inspired by discussion from Stackoverflow, R-help, or Bioconductor, with citations thanking principal author(s). Many thanks to the original authors! The R community is built upon the collective greatness of its contributors!
Most of the functions are designed around workflows for Bioinformatics analyses, where functions need to be efficient when operating over 10,000 to 100,000 elements. (They work quite well with millions as well.) Usually the speed gains are obvious with about 100 elements, then scale linearly (or worse) as the number increases. I and others use these functions all the time.
One example function writeOpenxlsx()
is a simple wrapper around very
useful openxlsx::write.xlsx()
, which also applies column formatting
for column types: P-values, fold changes, log2 fold changes, numeric,
and integer values. Columns use conditional Excel formatting to apply
color-shading to cells for each type.
Similarly, readOpenxlsx()
is a wrapper function to
openxlsx::read.xlsx()
which reads each worksheet and returns a list
of data.frame
objects. It can detect multi-row column headers, for
which it returns combined column names. It also applies equivalent of
check.names=FALSE
so column names are returned without change.
Small and large efficiencies are used wherever possible. The
mixedSort()
functions are based upon gtools::mixedsort()
, with
additional optimizations for speed and custom needs. It sorts chromosome
names, gene names, micro-RNA names, etc.
mixedSort()
- highly efficient alphanumeric sort, for example chr1, chr2, chr3, chr10, etc.mixedSortDF()
- as above, applied to columns in adata.frame
(ormatrix
,tibble
,DataFrame
, etc.)mixedSorts()
- as above, applied to a list of vectors with no speed loss.
Example:
miRNA | sort_rank | mixedSort_rank | |
---|---|---|---|
2 | ABCA2 | 2 | 1 |
1 | ABCA12 | 1 | 2 |
3 | miR-1 | 3 | 3 |
6 | miR-1a | 6 | 4 |
7 | miR-1b | 7 | 5 |
8 | miR-2 | 8 | 6 |
4 | miR-12 | 4 | 7 |
9 | miR-22 | 9 | 8 |
5 | miR-122 | 5 | 9 |
These functions help with base R plots, in all those little cases when
the amazing ggplot2
package is not a smooth fit.
nullPlot()
- convenient “blank” base R plot, optionally displays marginsplotSmoothScatter()
- smooth scatterplot()
for point density, enhanced oversmoothScatter()
plotPolygonDensity()
- fast density/histogram plot for vector or matriximageDefault()
- enhancedimage()
that enables raster output with consistent pixel aspect ratio.imageByColors()
- wrapper toimage()
for a matrix or data.frame of colors, with optional labelsminorLogTicksAxis()
- log-transformed axis labels, flexible log base, and option for properly adjustedlog2(1 + x)
format.sqrtAxis()
- draw a square-root transformed axis, with proper labels.drawLabels()
- draw square colorized text labelsshadowText()
- replacement fortext()
that draws shadows or outlines.groupedAxis()
- grouped axis labels to show regions/rangesdecideMfrow()
- determine appropriate value forpar("mfrow")
for multipanel output in base R plotting.getPlotAspect()
- determine visible plot aspect ratio.
Every Bioinformatician/statistician needs to write data to Excel, the
writeOpenxlsx()
function is consistent and makes it look pretty. You
can save numerous worksheets in a single Excel file, without having to
go back and custom-format everything.
writeOpenxlsx()
- flexible Excel exporter, with categorical and conditional colors.applyXlsxCategoricalFormat()
- apply categorical colors to ExcelapplyXlsxConditionalFormat()
- apply conditional colors to Excel
Almost everything uses color somewhere, especially on R console, and in every R plot.
getColorRamp()
- flexible to create or retrieve color gradientswarpRamp()
- “bend” a color gradient to enhance the visual rangecolor2gradient()
- convert a color to gradient of n colors; or do the same for a vectormakeColorDarker()
- adjust darkness and saturationshowColors()
- display a vector or list of colorsfixYellow()
- adjust the weird green-yellow, by personal preferenceprintDebug()
- pretty colorized text output usingcrayon
package.rainbow2()
- rainbow categorical colors with enhanced visual contrast.
Cool methods to operate on super-long lists in one call, to avoid
looping through the list either with for()
loops, lapply()
or
map()
functions.
cPaste()
- highly efficientpaste()
over a large list of vectorscPasteS()
- as above but usingmixedSort()
beforepaste()
.cPasteU()
- as above but usinguniques()
beforepaste()
.cPasteSU()
- as above but usingmixedSort()
anduniques()
beforepaste()
.uniques()
- efficientunique()
over a list of vectorssclass()
- runsclass()
on a listsdim()
,ssdim()
- dimensions of list objects, or nested list of listsrbindList()
- efficientdo.call(rbind, ...)
to bind rows into a matrix or data.frame, useful when followingstrsplit()
.mergeAllXY()
- merge a list ofdata.frame
objectsrmNULL()
- remove NULL or empty elements from a list, with optional replacement
We use R names as an additional method to make sure everything is kept in the proper order. Many R functions return results using input names, so it helps to have a really solid naming strategy. For the R functions that remove names – I highly recommend adding them back yourself!
makeNames()
- make unique names, using flexible logicnameVector()
- add names to a vector, using its own value, or supplied namesnameVectorN()
- make named vector using the names of a vector (useful insidelapply()
) or any function that returns data using names of the input vector.
pasteByRow()
- fast, flexible row-paste with delimiters, optionally remove blankspasteByRowOrdered()
- as above but returns ordered factor, using existing factor orders from each column when presentrowGroupMeans()
,rowRmMadOutliers()
- efficient grouped row functionsmergeAllXY()
- merge a list ofdata.frame
into onerenameColumn()
- rename columnsfrom
andto
.kable_coloring()
- flexible colorizeddata.frame
output in Rmarkdown.
gsubOrdered()
- gsub that returns ordered factor, maintians the previous factor ordergrepls()
- grep the environment (including attached packages) for object namesvgrep()
,vigrep()
- value-grep shortcutunvgrep()
,unvigrep()
- un-grep – remove matched results from the output.provigrep()
- progressive grep, searches each pattern in order, returning results in that orderigrepHas()
- rapid case-insensitive grep presence/absense testucfirst()
- upper-case the first letter of each word.padString()
,padInteger()
- produce strings from numeric values with consistent leading zeros.
noiseFloor()
- apply noise floor (and ceiling) with flexible replacement valueswarpAroundZero()
- warp a numeric vector symmetrically around zerorowGroupMeans()
,rowRmMadOutliers()
- efficient grouped row functionsdeg2rad()
,rad2deg()
- convert degrees to radiansrmNA()
- remove NA values, with optional replacementrmInfinite()
- remove infinite values, with optional replacement.formatInt()
- convenientformat()
for integer output, with comma-delimiter by default
jargs()
- pretty function arguments, optional pattern search argument name
jargs(plotSmoothScatter)
#> x = ,
#> y = NULL,
#> bwpi = 50,
#> binpi = 50,
#> bandwidthN = NULL,
#> nbin = NULL,
#> expand = c(0.04, 0.04),
#> transFactor = 0.25,
#> transformation = function( x ) x^transFactor,
#> xlim = NULL,
#> ylim = NULL,
#> xlab = NULL,
#> ylab = NULL,
#> nrpoints = 0,
#> colramp = c("white", "lightblue", "blue", "orange", "orangered2"),
#> col = "black",
#> doTest = FALSE,
#> fillBackground = TRUE,
#> naAction = c("remove", "floor0", "floor1"),
#> xaxt = "s",
#> yaxt = "s",
#> add = FALSE,
#> asp = NULL,
#> applyRangeCeiling = TRUE,
#> useRaster = TRUE,
#> verbose = FALSE,
#> ... =
sdim()
,ssdim()
- dimensions of list objects, or nested list of listssdima()
- runssdim()
on the attributes of an object.isTRUEV()
,isFALSEV()
- vectorized test for TRUE or FALSE values, sinceisTRUE()
only operates on single values, and does not allowNA
.
printDebug()
- pretty colorized text output usingcrayon
package.setPrompt()
- pretty colorized R console prompt with project name and R versionnewestFile()
- most recently modified file from a vector of files
jamma
– MA-plots (also known as “mean-variance”, “Bland-Altman”, or “mean-difference” plots), relies uponjamba::plotSmoothScatter()
;centerGeneData()
to apply flexible row-centering with optional groups and control samples;jammanorm()
- normalize data based upon MA-plot outputcolorjam
–colorjam::rainbowJam()
for scalable categorical colors using alternating luminance and chroma values.genejam
– fast, consistent conversion of gene symbols to the most current gene nomenclaturesplicejam
– Sashimi plots for RNA-seq datamultienrichjam
– multiple gene set enrichment analysis and visualizationplatjam
– platform technology functions, importers for NanoString