Skip to content

Profiling R Functions

Charlton Callender edited this page Dec 10, 2020 · 2 revisions

We want our code to be as fast and memory efficient as possible. There is a balance between getting every possible incremental improvement in performance and knowing when the code is good enough and not overly complicated.

There are different tools that can be used to profile R code. The chapter on Measuring performance in the Advanced R book by Hadley has lots of details.

This PR from our hierarchyUtils package is also a good place to look at how profiling can be used to speed up a function. https://github.com/ihmeuw-demographics/hierarchyUtils/pull/56

profvis

The profvis R package is a tool that visualizes R profiling output. Under the hood it uses "uses data collected by Rprof, which is part of the base R distribution. At each time interval (profvis uses a default interval of 10ms), the profiler stops the R interpreter, looks at the current function call stack, and records it to a file. Because it works by sampling, the result isn’t deterministic. Each time you profile your code, the result will be slightly different."

profvis returns an interactive graphical interface that tells us how much time was spent in each line and connects to the exact lines of code in the package. For example below we can see that over the entire profiling period about ~18 seconds were spent within check_agg_scale_subtree_dt. This allows us to pick exact spots to speed up.

bench

See the Advanced R section on microbenchmarking. Microbenchmarking is basically measuring the performance of a small piece of code. This can be helpful when comparing how well different ways of doing the same task perform (time, memory, etc.). The bench R package is a useful tool for doing microbenchmarking in R

Below compares two different ways of checking if a vector x has any NAs. This shows that the assertive::is_not_na function is much slower.

x <- rep(0, 10000000)

timing <- bench::mark(
    all(assertive::is_not_na(x)),
    all(!is.na(x))
)
> timing
# A tibble: 2 x 13
  expression                        min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result    memory             time       gc          
  <bch:expr>                   <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>    <list>             <list>     <list>      
1 all(assertive::is_not_na(x))    3.37s    3.37s     0.297   648.5MB    0.891     1     3      3.37s <lgl [1]> <Rprofmem[,3] [14… <bch:tm [… <tibble [1 …
2 all(!is.na(x))                30.66ms  39.06ms    15.2      76.3MB    3.80      8     2   526.81ms <lgl [1]> <Rprofmem[,3] [2 … <bch:tm [… <tibble [8 …