-
Notifications
You must be signed in to change notification settings - Fork 0
Profiling R Functions
We want our code to be as fast and memory efficient as possible. There is a balance between getting every possible incremental improvement in performance and knowing when the code is good enough and not overly complicated.
There are different tools that can be used to profile R code. The chapter on Measuring performance in the Advanced R book by Hadley has lots of details.
This PR from our hierarchyUtils
package is also a good place to look at how profiling can be used to speed up a function. https://github.com/ihmeuw-demographics/hierarchyUtils/pull/56
The profvis R package is a tool that visualizes R profiling output. Under the hood it uses "uses data collected by Rprof, which is part of the base R distribution. At each time interval (profvis uses a default interval of 10ms), the profiler stops the R interpreter, looks at the current function call stack, and records it to a file. Because it works by sampling, the result isn’t deterministic. Each time you profile your code, the result will be slightly different."
profvis returns an interactive graphical interface that tells us how much time was spent in each line and connects to the exact lines of code in the package. For example below we can see that over the entire profiling period about ~18 seconds were spent within check_agg_scale_subtree_dt. This allows us to pick exact spots to speed up.
See the Advanced R section on microbenchmarking. Microbenchmarking is basically measuring the performance of a small piece of code. This can be helpful when comparing how well different ways of doing the same task perform (time, memory, etc.). The bench R package is a useful tool for doing microbenchmarking in R
Below compares two different ways of checking if a vector x
has any NAs. This shows that the assertive::is_not_na
function is much slower.
x <- rep(0, 10000000)
timing <- bench::mark(
all(assertive::is_not_na(x)),
all(!is.na(x))
)
> timing
# A tibble: 2 x 13
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
1 all(assertive::is_not_na(x)) 3.37s 3.37s 0.297 648.5MB 0.891 1 3 3.37s <lgl [1]> <Rprofmem[,3] [14… <bch:tm [… <tibble [1 …
2 all(!is.na(x)) 30.66ms 39.06ms 15.2 76.3MB 3.80 8 2 526.81ms <lgl [1]> <Rprofmem[,3] [2 … <bch:tm [… <tibble [8 …