Skip to content

Special environments

hadley edited this page Jun 5, 2013 · 11 revisions

Special contexts

R's lexical scoping rules, lazy argument evaluation and first-class environments make it an excellent environment for designing special environments that allow you to create domain specific languages (DSLs). This chapter shows how you can use these ideas to evaluate code in special contexts that pervert the usual behaviour of R. We'll start with simple modifications then work our way up to evaluations that completely redefine ordinary function semantics in R.

This chapter uses ideas from the breadth of the book including non-standard evaluation, environment manipulation, active bindings, ...

  • Evaluate code in a special context: local, capture.output, with_*

  • Supply an expression instead of a function: curve

  • Combine evaluation with extra processing: test_that, assert_that

  • Create a full-blown DSL: html, plotmath, deriv, parseNamespace, sql

Evaluate code in a special context

It's often useful to evaluate a chunk of code in a special context

  • Temporarily modifying global state, like the working directory, environment variables, plot options or locale.

  • Capture side-effects of a function, like the text it prints, or the warnings it emits

  • Evaluate code in a new environment

with_something

There are a number of parameters in R that have global state (e.g. option(), environmental variables, par(), ...) and it's useful to be able to run code temporarily in a different context. For example, it's often useful to be able to run code with the working directory temporarily set to a new location:

in_dir <- function(dir, code) {
  old <- setwd(dir)
  on.exit(setwd(old))

  force(code)
}
getwd()
in_dir("~", getwd())

The basic pattern is simple:

  • We first set the directory to a new location, capturing the current location from the output of setwd.

  • We then use on.exit() to ensure that the working directory is returned to the previous value regardless of how the function exits.

  • Finally, we explicitly force evaluation of the code. (We don't actually need force() here, but it makes it clear to readers what we're doing)

We could use similar code to temporarily set global options:

with_options <- function(opts, code) {
  old <- options(opts)
  on.exit(options(old))

  force(code)
}
x <- 0.123456
print(x)
with_options(list(digits = 3), print(x))
with_options(list(digits = 1), print(x))

You might notice that there's a lot of similarity between these two functions. We could extract out that commonality with a function operator that takes the setter function as an input:

with_something <- function(set) {
  function(new, code) {
    old <- set(new)
    on.exit(set(old))
    force(code)
  }
}

Then we can easily generate a whole set of with functions:

in_dir <- with_something(setwd)
with_options <- with_something(options)
with_par <- with_something(par)

However, many of the setter functions that affect global state in R don't return the previous values in a way that can easily be passed back in. In that case, like for .libPaths(), which controls where R looks for packages to load, we first create a wrapper that enforces the behaviour we want, and then use with_something():

set_libpaths <- function(paths) {
  libpath <- normalizePath(paths, mustWork = TRUE)

  old <- .libPaths()
  .libPaths(paths)
  invisible(old)
}
with_libpaths <- with_something(set_libpaths)

These functions (and a few others) are provided by the devtools package. See ?with_something for more details.

capture.output

capture.output() is a useful function when the output you really want from a function is printed to the console. It also allows you to work around badly written functions that have no way to suppress output to the console. For example, it's difficult to capture the output of str():

y <- 1:10
y_str <- str(y)
y_str

y_str <- capture.output(str(y))
y_str

To work its magic, capture.output() uses sink(), which allows you to redirect the output stream to an arbitrary connection. We'll first write a helper function that allows us to execute code in the context of a sink(), automatically un-sink()ing when the function finishes:

with_sink <- function(connection, code, ...) {
  sink(connection, ...)
  on.exit(sink())

  code
}
with_sink("temp.txt", print("Hello"))
readLines("temp.txt")

With this in hand, we just need to add a little extra wrapping to our capture.output2() to write to a temporary file, read from it and clean up after ourselves:

capture.output2 <- function(code) {
  temp <- tempfile()
  on.exit(file.remove(temp))
  
  with_sink(temp, force(code))
  readLines(temp)
}
capture.output2(cat("a", "b", "c", sep = "\n"))

The real capture.output() is a bit more complicated: it uses a local textConnection to capture the data sent to sink, and it allows you to supply multiple expressions which are evaluated in turn. Using with_sink() this looks like capture.output3()

capture.output3 <- function(..., env = parent.frame()) {
  txtcon <- textConnection("rval", "w", local = TRUE)
  
  with_sink(txtcon, {
    args <- dots(...)
    for(i in seq_along(args)) {
      out <- withVisible(eval(args[[i]], env))
      if (out$visible) print(out$value)
    }
  })

  rval
}

You might want to compare this function to the real capture.output() and think about the simplifications I've made. Is the code easier to understand or harder? Have I removed important functionality?

If you want to capture more types of output (like messages and warnings), you may find the evaluate package helpful. It powers knitr, and does it's best to ensure high fidelity between its output and what you'd see if you copy and pasted the code at the console.

Evaluating code in a new environment

In the process of performing a data analysis, you may create variables that are necessarily because they help break a complicated sequence of steps down in to easily digestible chunks, but are not needed afterwards. For example, in the following example, we might only want to keep the value of c:

a <- 10
b <- 30
c <- a + b

It's useful to be able to store only the final result, preventing the intermediate results from cluttering your workspace. We already know one way of doing this, using a function:

c <- (function() {
  a <- 10
  b <- 30
  a + b
})()

(In javascript this is called the immediately invoked function expression (IIFE), and is used extensively in modern javascript to encapsulate different javascript libraries)

R provides another tool that's a little less verbose, the local() function:

c <- local({
  a <- 10
  b <- 30
  a + b
})

The idea of local is to create a new environment (inheriting from the current environment) and run the code in that. The essence of local() is captured in this code:

local2 <- function(expr) {
  envir <- new.env(parent = parent.frame())
  eval(substitute(expr), envir)  
}

The real local() code is considerably more complicated because it adds a second environment parameter. I don't think this is necessary because if you have an explicit environment parameter, then you can already evaluate code in that environment with evalq(). The original code is also hard to understand because it is very concise and uses some sutble features of evaluation (including non-standard evaluation of both arguments). If you have read computing-on-the-language, you might be able to puzzle it out, but to make it a bit easier I have rewritten it in a simpler style below.

local2 <- function(expr, envir = new.env()) {
  env <- parent.frame()
  call <- substitute(eval(quote(expr), envir))

  eval(call, env)
}
a <- 100
local2({
  b <- a + sample(10, 1)
  my_get <<- function() b
})
my_get()

You might wonder we can't simplify to this:

local3 <- function(expr, envir = new.env()) {
  eval(substitute(expr), envir)
}

But it's because of how the arguments are evaluated - default arguments are evalauted in the scope of the function so that local(x) would not the same as local(x, new.env()) without special effort.

Exercises

  • Compare capture.output() to capture.output2() and local() to local2(). How do the functions differ? What features have I removed to make the key ideas easier to see? How have I rewritten the key ideas to be easier to understand?

  • How does the chdir parameter of source() compare to in_dir()? Why might you prefer one approach to the other?

Anaphoric functions

Another variant along these lines is an "anaphoric function", or a function that uses a pronoun. This is easiest to understand with an example using an interesting anaphoric function in base R: curve().curve() draws a plot of the specified function, but interestingly you don't need to use a function, you just supply an expression that uses x:

curve(x ^ 2)
curve(sin(x), to = 3 * pi)
curve(sin(exp(4 * x)), n = 1000)

Here x plays a role like a pronoun in an English sentence: it doesn't represent a single concrete value, but instead is a place holder that varies over the range of the plot. Note that it doesn't matter what the value of x outside of curve() is: the expression is evaluated in a special environment where x has a special meaning:

x <- 1
curve(sin(exp(4 * x)), n = 1000)

curve() works by evaluating the expression in a special environment in which the appropriate x exists. The essence of curve(), omitting many useful but incidental details like plot labelling, looks like this:

curve2 <- function(expr, xlim = c(0, 1), n = 100, env = parent.frame()) {
  env2 <- new.env(parent = env)
  env$x <- seq(xlim[1], xlim[2], length = n)
  
  y <- eval(expr, env2)
  plot(x, y, type = "l", ylab = deparse(substitute(expr)))
}
curve2(sin(exp(4 * x)), n = 1000)

Creating a new environment containing the pronoun is the key technique for implementing anaphoric functions.

Another way to solve the problem would be to turn the expression into a function using make_function():

curve3 <- function(expr, xlim, n = 100, env = parent.frame()) {
  f <- make_function(alist(x = ), substitute(expr), env)
  
  x <- seq(xlim[1], xlim[2], length = n)
  y <- f(x)
  
  plot(x, y, type = "l", ylab = deparse(substitute(expr)))
}
curve3(sin(exp(4 * x)), n = 1000)

The approaches take about as much code, and require knowledge of the same number of fundamental R concepts. I would have a slight preference for the second because it would be easier to reuse the part of the curve3() that turns an expression into a function. All anaphoric functions need careful documentation so that the user knows that some variable will have special properties inside the anaphoric function and must otherwise be avoided.

If you're interesting in learning more, there are some good resources for anaphoric functions in arc (a list like language), perl and clojure

With connection

We could use this idea to create a function that allows you to use a connection, automatically openning and closing it if needed. This is a common technique in ruby, which uses its block syntax whenever you deal with resources that need to be closed after you're done with them. We can't use the same implementation as the with functions above because an implementation like that below gives no way for the code to refer to the connection:

with_conn <- function(conn, code) {
  open(conn)
  on.exit(close(conn))

  force(code)
}

We can work around this problem by using an anaphoric function, referring to the connection as .it. We'd need to clearly document that convention so the user knew what to call it.

with_conn <- function(conn, code, env = parent.frame()) {
  if (!isOpen(conn)) {
    open(conn)
    on.exit(close(conn))
  }

  env2 <- new.env(parent = env)
  env$.it <- conn

  eval(substitute(code), env)
}

# Create a new file, and then read it in in pieces of length 6
with_conn(file("test.txt", "w+"), {
  writeLines("This is a test", .it)
  x <- readChar(.it, 6)

  while(length(x) > 0) {
    print(x)
    x <- readChar(.it, 6)
  }
})

The construction of this function also has the appealing side-effect that the scope of the connection is clearly shown with indenting.

Special environments for run-time checking

We can take the idea of special evaluation contexts and use the idea to implement run-time checks, by executing code in a special environment that warns or errors when the user does something that we think is a bad idea.

  • Checking for logical abbreviations

Logical abbreviations

One of the checks that R CMD check runs ensures that you don't use the logical abbreviations T and F. It's a useful check, but a bit frustrating to use because it terminates on the first F or T it finds, so you need to run it many times, but R CMD check is slow. Instead, we can take the idea that underlies the check and run it ourselves, which makes it faster to iterate.

The basic idea is to use an active binding so that whenever code tries to access F or T it throws an error. This is a very similar technique to anaphoric functions:

check_logical_abbr <- function(code, env = parent.frame()) {
  new_env <- new.env(parent = env)
  delayedAssign("T", stop("Use TRUE not T"), assign.env = new_env)
  delayedAssign("F", stop("Use FALSE not F"), assign.env = new_env)

  eval(substitute(code), new_env)
}

check_logical_abbr(c(FALSE, T, FALSE))

Note that functions look in the environment in which they were defined so to test large bodies of code, you'll need to run check_logical_abbr() as far as out as possible:

f <- function(x) {
  mean(x, na.rm = T)
}
check_logical_abbr(f(1:10))

check_logical_abbr({
  f <- function(x) {
    mean(x, na.rm = T)
  }
  f(1:10)
})

This will be typically easiest to do as a wrapper around the code you use to load your code into R:

check_logical_abbr(source("my-file.r"))
check_logical_abbr(load_all())

Test that

Manage global state accessible from functions. Use special environment.

Create with function that uses on.exit() to push and pop from stack.

testthat_env <- new.env()
testthat_env$reporter <- StopReporter$new()

set_reporter <<- function(value) {
  old <- testthat_env$reporter
  testthat_env$reporter <- value
  old
}
get_reporter <<- function() {
  testthat_env$reporter
}
with_reporter <- function(reporter, code) {
  reporter <- find_reporter(reporter)

  old <- set_reporter(reporter)
  on.exit(set_reporter(old))

  reporter$start_reporter()
  res <- force(code)
  reporter$end_reporter()

  res
}

Domain specific languages

The combination of first class environments and lexical scoping gives us a powerful toolkit for creating domain specific languages in R. In this section we'll explore how you can new languages that use R's syntax but have different behaviours.

We'll first look at html, making it possible to write code that produces html structured in a way very similar to the output html.

with_html(
  body(
    h1("A heading", id = "first"),
    p("Some text. ", b("Some bold text."), "Some more text")
  )
)
Clone this wiki locally