strboul blog

Computing on the R language


Metaprogramming, or also known as "computing on the language" simply meaning that one can write some code to write code. Since everything entered as a valid code in R are "expressions", R has great capabilities in metaprogramming2.

Things can get pretty complicated and be fragile with metaprogramming. If you chose to use it in your code, be sure that you have a valid reason.

There are four main types to describe the capabilities:

  1. Constants are like NULL or length-1 atomic vectors1, e.g. "a" or 1L.

  2. Symbols are also called as names. For instance, var in var <- 1. Access it with is.name() or is.symbol() but the latter is better for consistency.

  3. Calls are like function calls that are in a special form where the first element is the symbol name. Access it by is.call().

  4. Pairlists only exist in the function call arguments of functions.

We look at the available functions next.

Expression

expression()
is.expression()
as.expression()

Substitute

substitute replaces variables with values in the expressions. They can be used to template the expressions.

substitute(x * y, list(x = 2, y = 5))

Quote

quote and expression are pretty the same when you evaluate them with eval(). However, the difference is that expression() wraps the statements as an expression object, therefore returns a vector of unevaluated expressions whereas quote() just returns an unevaluated expression.

as.list(quote(x <- 2 + 3))
as.list(expression(y <- 5 * 8))

bquote is just like quote but it allows partial substitution in expressions. Only the expressions wrapped between .() are evaluated. bquote is the only form of "quasiquotation" available in base R (Wickham, 2019).

Using bquote can sometimes be more flexible than using substitute(). For example:

n <- 5
substitute(p + x, list(x = n))
bquote(p + .(n))

And this is how enquote works:

z <- 5
enquote(z == 1)

If you want to return the quote itself, wrap the quote inside substitute.

substitute(quote(a = 2))

Symbols

name and symbol mean the same, that refers to the name of the R objects.

as.symbol()
is.symbol()

as.name()
is.name()

While class and mode say name, the rest implies symbol.

e <- expression(fun <- function(x) x)
e[[1]]
# fun <- function(x) x
e[[1]][[1]]
# `<-`
e[[1]][[2]]
# fun
mmy::object_types(e[[1]][[2]])
#       __type__ __value__
# 1        class      name
# 2       typeof    symbol
# 3         mode      name
# 4 storage.mode    symbol
# 5    sexp.type    SYMSXP

Unfortunately, R interface is full of legacy stuff, at some point in time, they are called as names. Although, that sounds technically correct, I see that created a confusing with the actual names command. Symbols have a "name" mode, "symbol" storage mode and a "symbol" type.

There's a note in the documentation in the ?name:

The term ‘symbol’ is from the LISP background of R, whereas ‘name’ has been the standard S term for this.

I'd prefer to stick to the "symbol" as it also seems to be more common among the other programming languages.

Call

call() is used to construct a call object.

call("convolve")
call("convolve", x = 3, y = 5)
(cconv <- call("convolve", x = 3, y = 5))
as.list(cconv)
eval(cconv)

N.B. do.call() calls a function by a name on a given argument list.

N.B. There’s a bunch of functions to access and manipulate the call stack. See ?sys.parent documentation for more information.

Function

Functions (or in R, they are also all closures) have three components:

square <- function(x) {
    x ^ 2
}
formals(square)
#  $x
body(square)
#  {
#      x^2
#  }
environment(square)
#  <environment: R_GlobalEnv>

Language

R considers calls, expressions and symbols as language.

e <- expression(x <- 1)
is.language(e)
# [1] TRUE
mmy::object_types(e)
#       __type__  __value__
# 1        class expression
# 2       typeof expression
# 3         mode expression
# 4 storage.mode expression
# 5    sexp.type    EXPRSXP
e[[1]][[1]]
# `<-`
mmy::object_types(e[[1]][[1]])
#       __type__ __value__
# 1        class      name
# 2       typeof    symbol
# 3         mode      name
# 4 storage.mode    symbol
# 5    sexp.type    SYMSXP

Note that objects returned by quote are “not” considered as the language.

is.language(quote(1))
# [1] FALSE

Parsing

utils::getParseData() can be used to parse the R code at a low level.

e <- expression({
  x <- 10
  y <- "char"
  z <<- 2
  # some comment here..
  lapply(mtcars, function(i) {
    pnorm(mtcars[i, i], log.p = TRUE)
  }) -> res
  paste(y, res, sep = ":")
})
prs <- parse(text = e)
parsed <- getParseData(prs)
head(parsed)
#      line1 col1 line2 col2  id parent       token terminal text
#  127     1    1     9    1 127      0        expr    FALSE
#  1       1    1     1    1   1    127         '{'     TRUE    {
#  9       2    5     2   11   9    127        expr    FALSE
#  3       2    5     2    5   3      5      SYMBOL     TRUE    x
#  5       2    5     2    5   5      9        expr    FALSE
#  4       2    7     2    8   4      9 LEFT_ASSIGN     TRUE   <-
Token Example Note
COMMENT #
LEFT_ASSIGN <-, <<- right assign -> turned into left assign
SYMBOL mtcars, x, ...
FUNCTION function
SYMBOL_FORMALS i
SYMBOL_FUNCTION_CALL lapply, pnorm, ...
SYMBOL_SUB log.p specified arg. names in function calls
EQ_ASSIGN = (equality assignment with equal sign e.g. x = 2)
EQ SUB = function argument with value (e.g. in square(x = 4))
STR_CONST "char"
NUM_CONST 10

There are also some tokens such as '{', '(' and ','. Right assign operator -> is turned into the commonly used left assign operator <- when R parsing expressions.

expression(lapply(mtcars, mean) -> res)

Resources


  1. R does not have scalar values per se

  2. R inherited the metaprogramming features mostly from the LISP/Scheme world.