The cost of evaluating an expression instead of using the language directly

Last updated on Aug 17, 2019 3 min read R

In a recent blog post I used something like this (for use in a call to optim()):

obj_fun_expr <- expression((x^4 + 4 * x^2 * y^2 - 12 * x^2 * a + 144 * x^2 - 
     48 * x * y * a + 144 * x * y - 4320 * x + 36 * y^2 - 2160 * 
     y + 180 * a^2 + 32400)/16)

f_expr <- function(par) {
  x <- par[1]
  y <- par[2]
  a <- par[3]
  
  val <- eval(obj_fun_expr, list(x = x, y = y, a = a))
  return(val)
}

I have been wondering what the cost of eval(expr, ...) is instead of having something like this:

f_lang <- function(par) {
  x <- par[1]
  y <- par[2]
  a <- par[3]
  
  val <- (x^4 + 4 * x^2 * y^2 - 12 * x^2 * a + 144 * x^2 - 
     48 * x * y * a + 144 * x * y - 4320 * x + 36 * y^2 - 2160 * 
     y + 180 * a^2 + 32400)/16
  return(val)
}

Another option is to fill out the function body using the expression:

f_lang_expr <- function(par) {
}

f_lines <- parse(text = c(
  "x <- par[1]",
  "y <- par[2]",
  "a <- par[3]",
  paste0("val <- ", obj_fun_expr),
  "return(val)"))

body(f_lang_expr) <- as.call(c(as.name("{"), f_lines))
f_lang_expr
## function (par) 
## {
##     x <- par[1]
##     y <- par[2]
##     a <- par[3]
##     val <- (x^4 + 4 * x^2 * y^2 - 12 * x^2 * a + 144 * x^2 - 
##         48 * x * y * a + 144 * x * y - 4320 * x + 36 * y^2 - 
##         2160 * y + 180 * a^2 + 32400)/16
##     return(val)
## }

Typically you would evaluate it multiple times (e.g. 1,000), e.g. for plotting or for optimising it.

So what is the difference between them?

First we ensure that the give the same answers:

xs <- seq(1, 100, length.out = 1000)
res_f_expr <- lapply(xs, function(x) f_lang(c(x, 1, 1)))
res_f_lang <- lapply(xs, function(x) f_lang(c(x, 1, 1)))
res_f_lang_expr <- lapply(xs, function(x) f_lang(c(x, 1, 1)))
all.equal(res_f_expr, res_f_lang)
## [1] TRUE
all.equal(res_f_expr, res_f_lang_expr)
## [1] TRUE

Now, we can calculate the difference in run times:

library(microbenchmark)

m <- microbenchmark(
  f_expr = lapply(xs, function(x) f_lang(c(x, 1, 1))),
  f_lang = lapply(xs, function(x) f_lang(c(x, 1, 1))),
  f_lang_expr = lapply(xs, function(x) f_lang(c(x, 1, 1))),
  times = 10
)

print(m, unit = "s") # seconds
## Unit: seconds
##         expr         min          lq        mean      median          uq
##       f_expr 0.001826758 0.001929865 0.002141647 0.001966037 0.002257127
##       f_lang 0.001795386 0.001847062 0.002253064 0.001899274 0.002141826
##  f_lang_expr 0.001799808 0.001874676 0.002043506 0.001894780 0.001937986
##          max neval cld
##  0.003082509    10   a
##  0.004323419    10   a
##  0.003174221    10   a

On my computer there is not really any difference. Normally, I would expect a small difference, but as seen it is not really something to worry about (at least not to begin with). Remember: “[…]premature optimization is the root of all evil[…]”.

Mikkel Meyer Andersen

Assoc. Professor of Applied Statistics

My research interests include applied statistics and computational statistics.

The cost of evaluating an expression instead of using the language directly

Mikkel Meyer Andersen

Assoc. Professor of Applied Statistics

Related