Three Deep Truths About R

So, what are the implications of these statements?

Everything in R is an object

NULL is an object

NA is an object (its type depends on the context in which it’s found)

5 is an object (a length 1 numeric vector)

"alpha" is an object (a length 1 character vector where the number of characters in the first element is 5)

c(1, 2, 3, 4) creates an object (a length 4 numeric vector)

function() {} is an object (a function object)

+ is an object (a name associated with a function object, also known as a symbol)

Functions are “first class citizens” in R. Another way to say that is that functions are themselves objects. You know that objects can be provided as input to functions. Since functions are objects, they, too, can be provided as input to other functions. This is the basis of functional programming. Functions can be inspected, they can be passed as parameters to other functions, and a function may return function.

Everything that happens in R is the result of a function call

function() {
}

is an anonymous function (i.e., it has no name associated with it) with zero inputs and an implicit output of NULL.

function(a) {
}

is an anonymous function with one input and an implicit output of NULL.

function(a) {
  a
}

is an anonymous function with one input and an explicit output that is the same as its input, a. You might think of this as an “identity” function: the result of calling this function on an object is the object.

function(a = 5) {
  a
}

is an anonymous function with one input that has a default value (5) and an explicit output that is the same as its input, a. If you call this function without passing in a value for the parameter a, then you’ll get back 5, the default value of a.

identity <-
  function(a) {
    a
  }

is an expression that invokes the infix assignment function <- with two inputs: (1) a name (identity) to associate with what otherwise would be (2) an anonymous function. Yes, functions can have weird names, such as <-. Recall that in high school maths, sin is the name associated with a function and not the names of three distinct variables: s, i, and n. So, you’ve seen function names that contain multiple characters—and odd characters, such as !, and + — before.

<- is an infix function. That is, you place the function name between its inputs. You may recognize infix functions from such maths expressions as 1 + 2. An alternative to infix function notation is prefix notation: sum(1, 2). You’ll see both styles in R. Whether a particular function tends to be expressed as infix or prefix is largely a matter of the history of its use and the convenience of the representation.

The version of the identity function, above, has one input with no default value and an explicit return value equal to its input, a. The name identity is meant to be intention-revealing: the name suggests what the purpose of the function is. Names in R must begin with a letter or a period and are usually limited to letters (upper- or lower-case), numbers, periods, and underscores. There are exceptions: + is a name associated with a function, * is a name associated with a function, and <- is a name associated with a function. You’ll also see functions with names such as %*%, %>%, and $.

When you know the name of a function, you can typically inspect it simply by typing its name in the console, without the usual parentheses following. For example, typing xor and pressing enter will display the definition of the xor function. For functions that start with a special character, you surround the function name with backticks to reference it: `+`

Some functions are implemented in languages other than R (e.g., C, Fortran). For those, you’ll see simply a note that they invoke a primitive or internal function. See, for example, basename or `+`.

You can see the help page for any function by prepending a question mark to its name: ?basename or ?xor

I’m being pedantic here when I’m making the distinction between (1) a function and (2) a name that is associated with that function. While they are not the same thing (a picture of a pipe is not itself a pipe), in casual parlance we can usually say “SIN is a function” and “identity is a function” and talk about “the assignment function” without much fear of confusion. There are times, however, when the distinction between functions and names associated with functions matters.

Names have objects, objects don’t have names

Consider the assignment of names to objects:

x <- 5
y <- 5

5 is an object (class: numeric vector; length: 1). <- is one name for the assignment function (assign is another and = is still another). x and y are names we’re associating with the object 5.

How many copies of the object 5 exist in your computer’s memory? A sufficient answer—although not the necessary one—is that there exists in your computer’s memory only one object that is a numeric vector of length 1 where that one element is the numeric object 5. Certainly, in terms of minimizing the amount of computer memory used to represent this object, such a scheme would be optimal. At the moment, that one object has two names assigned to it: x and y. Imagine that you are the numeric object 5. From your point of view, you know nothing about the many possible names that have been associated with you; x and y are only two of them—there could be many more!

In R, you don’t “assign values to variables” or “store 5 in x”. Variables in R are not boxes; they don’t contain objects. Instead, you “assign (the variable name) x to 5” or “bind x to 5”. Assignment is simply the association of a name with an object. Any given object may have many names associated with it. At a given instant, a name refers to only one object. Over time, the object a name refers to may vary.

Let’s go further and (1) define a function, (2) assign it a name, and then (3) invoke or call that function, providing it with a simple input value.

identity <-
  function(a) {
    # at this point, I can know the value of a, 
    #   but not the name of the variable
    #   that was passed in to the identity function (age).

    # I also know that I'm in a function, but I don't know what
    #  name was used to call me (identity).

    a
  }

age <- 20
identity(age)

identity, age, and a are all names associated with objects. The block of code beginning with function and ending with } is a function object. <- is a function object. Even ( is a function object—everything in R is an object. The lines that start with # are comments—notes from one programmer to another—and are ignored by R.

Let’s follow the execution of the code as the R interpreter processes it, step by step.

  1. The infix assignment function <- is called with two inputs: identity and the otherwise anonymous function.
  2. The name identity is associated with that anonymous function, making it no longer anonymous.
  3. age <- 20 similarly invokes the assignment function and as a result associates the name age with the length 1 numeric vector whose single item is the object 20.
  4. identity(age) invokes the function associated with the name identity and links together the current value of age with the new name a. For the moment, both age and a are names that refer to the same object.
    1. That link is called a promise in R: it’s a promise to provide the value of age, if a is ever needed within the identity function. A promise is itself an object—everything in R is an object.
  5. The comments are ignored. But read the comments now. This is where the statement—names have objects, objects don’t have names—is important. Once you’re inside the function, your code doesn’t know what name was used to call the function or what the name (if any) associated with the input value (if any) was. When you want to provide a title for graphics or filenames, that’s a pain. (NB: there is a way you can find out the name of the object that was passed in using the rlang package, but that’s beyond the scope of this post.)
  6. Finally we reach the line that has just an a on it. In R, the last object evaluated within a function is the returned value of that function. So, now that a is needed, the promise to associate the value of age with the name a is realized. The name a now is associated with the very same length 1 numeric vector whose single element is the numeric value 20.
  7. That vector is returned as the final value of the function because we explicitly evaluate a as the final instruction within the function.
  8. Now, attention turns to the last line again: the call to identity(age) has been fully evaluated; the function has been called and its return value determined.
  9. R’s interactive console, by default, calls the print function on any value left as a result of evaluating an expression. For example, when you type the number 1234 and press enter; R determines that 1234 needs no further evaluation and thus prints 1234 as the final output.
  10. Since the result of calling the identity function was the length 1 numeric vector whose single element is the numeric value 20, R calls print on that vector and displays a reasonable representation of that object: [1] 20

You can see why we associate intention-revealing names with objects: constantly having to say “the length 1 numeric vector whose single element is the numeric value 20” would be impractical. age <- 20, let’s talk about age, instead.

The Truth Shall Set You Free

To summarize:

  • Everything in R is an object
  • Everything that happens in R is the result of a function call
  • Names have objects, objects don’t have names

Many of the mysterious behaviors of R can be traced to these truths. Wield them well.

Left to the Reader as an Exercise

What do you think the result(s?) of the following code would be?

process_inputs <-
  function(item1, item2, f) {
    f(item1, item2)
  }

process_inputs(1, 2, sum)
process_inputs(1, 2, `/`) # those are backticks found on the tilde key ~

`/`