Page maintainers: Malte Lüken and Pablo Rodríguez-Sánchez .
R is a functional programming language and software environment for statistical computing and graphics: https://www.r-project.org/.
R is particularly popular in the social, health, and biological sciences where it is used for statistical modeling. R can also be used for signal processing (e.g. FFT), machine learning, image analyses, and natural language processing. The R syntax is similar to that of Matlab and Python in terms of compactness and readability, which makes it a good prototyping language for science.
One of the strengths of R is the large number of available open source statistical packages, often developed by domain experts. For example, R-package Seewave is specialised in sound analyses. Packages are typically released on CRAN The Comprehensive R Archive Network.
Are you familiar with Python? Then kickstart your R journey by reading this blog post.
All R functions come with documentation in a standardized format. Some R packages have their own google group. Further, stackoverflow and standard search engines can lead you to answers to issues.
If you prefer books, consider the following resources:
- R for Data Science by Hadley Wickham,
- Advanced R by Hadley Wickham,
- Writing better R code by Laurent Gatto.
To install R check detailed description at CRAN website.
R programs can be written in any text editor. R code can be run from the command line or interactively within R environment, that can be started with R
command in the shell. To quit R environment type q()
.
Said this, it is highly recommended to use an integrated development environment (IDE). The most popular one is RStudio / Posit. It is free and quite powerful. It features editor with code completion, command line environment, file manager, package manager and history lookup among others.
It comes with many menus and key bindings (visible when you hover your mouse over the menu item). For instance, you can run code sections by selecting them and pressing Ctrl+Enter
.
Note you will have to install RStudio in addition to installing R. Please note that updating RStudio does not automatically update R and the other way around.
Within RStudio you can work on ad-hoc code or create a project. Compared with Python an R project is a bit like a virtual environment as it preserves the workspace and installed packages for that project. Creating a project is needed to build an R package. A project is created via the menu at the top of the screen.
Not needed as most functions in R are already compiled in C, nevertheless R has compiling functionality as described in the R manual. See overview by Hadley Wickham.
We recommend following the Tidyverse style guide. Its guidelines can be automatically followed using linters such as:
Assigning variables with <-
instead of =
is recommended, although most of the time both are equivalent.
If you are interested in the controversy around assignment operators, check out this blog post.
The symbols %>%
and |>
represent the pipe operator.
The first one is part of the magrittr
package, and it gained so much popularity that a similar operator, |>
, was added as part of native R since version 4.1.0. For details on the differences between the two, see this blog post.
They just add syntactic sugar to the way we pass a variable to a function.
The example below shows its basic behavior:
var %>% function(params)
# Is equivalent to
function(var, params)
These operators are pretty useful for composing functions, and very often appear concatenated:
grades |> remove_nans() |> mean() |> print()
You can think of it as a production chain, were an object (the grades
) passes through three machines, one that removes the NaN
s, another one that takes the mean, and a last one that prints the result.
One of the strengths of R is its community, that creates and maintains a constellation of packages. Very rarely will you use just base R. Here we give you a list of usual packages, starting by one solving the first problem you'll find... how to manage that many packages!
renv
allows you to create and manage a dependencies library on a per-project basis. It also keeps track of the specific versions of each package used in the project, which is great for reproducibility... and avoiding future headhaches!
For a generic impression about plotting with R, see: https://www.r-graph-gallery.com/all-graphs
The basic R installation comes with a wide range of functions to plot data to a window on your screen or to a file. If you need to quickly inspect your data or create a custom-made static plot then the basic functions offer the building blocks to do the job. There is a Statmethods.net tutorial with some examples of plotting options in R.
However, externally contributed plotting packages may offer easier syntax or convenient templates for creating plots. The most popular and powerful contributed graphics package is ggplot2. Interactive plots can be made with ggvis package and embeded in web application, and this tutorial.
In summary, it is good to familiarize yourself with both the basic plotting functions as well as the contributed graphics packages. In theory, the basic plot functions can do everything that ggplot2 can do, it is mostly a matter of how much you like either syntax and how much freedom you need to tailor the visualisation to your use case.
Thanks to shiny.app it is possible to make interactive web application in R without the need to write javascript or html.
knitr is an R package designed to build dynamic reports in R. It's possible to generate on the fly new pdf or html documents with results of computations embedded inside.
There are packages that ease tidying up messy data, e.g. tidyr and reshape2. The idea of tidy and messy data is explained in a tidy data paper by Hadley Wickham. There is also the google group manipulatr to discuss topics related to data manipulation in R.
Speeding up code always start with knowing where your bottlenecks are. The following profiling tools will help you doing so:
- Introduction to profiling in R
Some rules of thumb that can quickly improve your code are the follwing:
- Avoid loops, use
apply
functionals instead - Try to use vectorized functions
- Checkout the
purrr
package - If you are really in a hurry, consider communicating with
C++
code usingRcpp
.
For a deeper introduction to the many optimization methods, check the free ebook:
- Efficient R programming, by Colin Gillespie and Robin Lovelace.
There is a great tutorial written by Hadley Wickam describing all the nitty gritty of building your own package in R. It's called R packages. For a quicker introduction, consider this software Carpentries' lesson on R packages, originated and developed at our Center!
Read Documentation chapter of Hadleys R packages book for details about documenting R code.
Customary R uses .Rd
files in /man
directory for documentation. These files and folders are automatically created by RStudio when you create a new project from your existing R-function files.
Function level comments starting with #'
are used by roxygen
to automatically generate the .Rd
files. This means that you don't have to edit the .Rd
files directly.
R function documentation offers plenty of space to document the functionality, including code examples, literature references, and links to related functions. Nevertheless, it can sometimes be helpful for the user to also have a more generic description of the package with for example use-cases. You can do this with a vignette
.
Read more about vignettes in Package documentation chapter of Hadleys R packages book.
Read more about roxygen
syntax on it's github page. roxygen
will also populate NAMESPACE
file which is necessary to manage package level imports.
Most of the templating is nativelly managed by the usethis
package.
It contains functions that create the boilerplate for you, reducing the burden on your memory and reducing chances for errors.
In the snippet below you can see how it feels to use it.
usethis::create_package() # Creates a package structure
usethis::use_readme_md() # Adds a readme
usethis::use_apache_license() # Adds an Apache License
usethis::use_testthat() # Adds the testing infrastructure
usethis::use_citation() # Adds a citation file
# etc...
Having said this, these others can serve as inspiration:
- https://rapporter.github.io/rapport/
- https://shiny.posit.co/r/articles/build/templates/
- https://bookdown.org/yihui/rmarkdown/document-templates.html
Testthat is a testing package by Hadley Wickham. Testing chapter of a book R packages describes in detail testing process in R with use of testthat
. Further, testthat: Get Started with Testing by Whickham may also provide a good starting point.
See also checking and testing R packages. note that within RStudio R package check and R package test can be done via simple toolbar clicks.
Continuous integration should be done with an online service. We recommend using GitHub actions.
Debugging is possible in RStudio, see link. For profiling tips see link
- Logging