- Everything that exists in R is an object ~ John M. Chambers
- Everything that happens in R is the result of a function call ~ John M. Chambers
- Names have objects; objects don’t have names ~ Hadley Wickham
So, what are the implications of these statements?
Everything in R is an object
NULL
is an object
NA
is an object (its type depends on the context in which it’s found)
5
is an object (a length 1 numeric vector)
"alpha"
is an object (a length 1 character vector where the number of characters in the first element is 5)
c(1, 2, 3, 4)
creates an object (a length 4 numeric vector)
function() {}
is an object (a function object)
+
is an object (a name associated with a function object, also known as a symbol)
Functions are “first class citizens” in R. Another way to say that is that functions are themselves objects. You know that objects can be provided as input to functions. Since functions are objects, they, too, can be provided as input to other functions. This is the basis of functional programming. Functions can be inspected, they can be passed as parameters to other functions, and a function may return function.
Everything that happens in R is the result of a function call
function() { }
is an anonymous function (i.e., it has no name associated with it) with zero inputs and an implicit output of NULL
.
function(a) { }
is an anonymous function with one input and an implicit output of NULL
.
function(a) { a }
is an anonymous function with one input and an explicit output that is the same as its input, a
. You might think of this as an “identity” function: the result of calling this function on an object is the object.
function(a = 5) { a }
is an anonymous function with one input that has a default value (5) and an explicit output that is the same as its input, a
. If you call this function without passing in a value for the parameter a
, then you’ll get back 5, the default value of a
.
identity <- function(a) { a }
is an expression that invokes the infix assignment function <-
with two inputs: (1) a name (identity
) to associate with what otherwise would be (2) an anonymous function. Yes, functions can have weird names, such as <-
. Recall that in high school maths, sin
is the name associated with a function and not the names of three distinct variables: s
, i
, and n
. So, you’ve seen function names that contain multiple characters—and odd characters, such as !
, and +
— before.
<-
is an infix function. That is, you place the function name between its inputs. You may recognize infix functions from such maths expressions as 1 + 2
. An alternative to infix function notation is prefix notation: sum(1, 2)
. You’ll see both styles in R. Whether a particular function tends to be expressed as infix or prefix is largely a matter of the history of its use and the convenience of the representation.
The version of the identity
function, above, has one input with no default value and an explicit return value equal to its input, a
. The name identity
is meant to be intention-revealing: the name suggests what the purpose of the function is. Names in R must begin with a letter or a period and are usually limited to letters (upper- or lower-case), numbers, periods, and underscores. There are exceptions: +
is a name associated with a function, *
is a name associated with a function, and <-
is a name associated with a function. You’ll also see functions with names such as %*%
, %>%
, and $
.
When you know the name of a function, you can typically inspect it simply by typing its name in the console, without the usual parentheses following. For example, typing xor
and pressing enter will display the definition of the xor
function. For functions that start with a special character, you surround the function name with backticks to reference it: `+`
Some functions are implemented in languages other than R (e.g., C, Fortran). For those, you’ll see simply a note that they invoke a primitive or internal function. See, for example, basename
or `+`
.
You can see the help page for any function by prepending a question mark to its name: ?basename
or ?xor
I’m being pedantic here when I’m making the distinction between (1) a function and (2) a name that is associated with that function. While they are not the same thing (a picture of a pipe is not itself a pipe), in casual parlance we can usually say “SIN is a function” and “identity is a function” and talk about “the assignment function” without much fear of confusion. There are times, however, when the distinction between functions and names associated with functions matters.
Names have objects, objects don’t have names
Consider the assignment of names to objects:
x <- 5 y <- 5
5
is an object (class: numeric vector; length: 1). <-
is one name for the assignment function (assign
is another and =
is still another). x
and y
are names we’re associating with the object 5
.
How many copies of the object 5
exist in your computer’s memory? A sufficient answer—although not the necessary one—is that there exists in your computer’s memory only one object that is a numeric vector of length 1 where that one element is the numeric object 5
. Certainly, in terms of minimizing the amount of computer memory used to represent this object, such a scheme would be optimal. At the moment, that one object has two names assigned to it: x
and y
. Imagine that you are the numeric object 5
. From your point of view, you know nothing about the many possible names that have been associated with you; x
and y
are only two of them—there could be many more!
In R, you don’t “assign values to variables” or “store 5 in x”. Variables in R are not boxes; they don’t contain objects. Instead, you “assign (the variable name) x to 5” or “bind x to 5”. Assignment is simply the association of a name with an object. Any given object may have many names associated with it. At a given instant, a name refers to only one object. Over time, the object a name refers to may vary.
Let’s go further and (1) define a function, (2) assign it a name, and then (3) invoke or call that function, providing it with a simple input value.
identity <- function(a) { # at this point, I can know the value of a, # but not the name of the variable # that was passed in to the identity function (age). # I also know that I'm in a function, but I don't know what # name was used to call me (identity). a } age <- 20 identity(age)
identity
, age
, and a
are all names associated with objects. The block of code beginning with function
and ending with }
is a function object. <-
is a function object. Even (
is a function object—everything in R is an object. The lines that start with #
are comments—notes from one programmer to another—and are ignored by R.
Let’s follow the execution of the code as the R interpreter processes it, step by step.
- The infix assignment function
<-
is called with two inputs:identity
and the otherwise anonymous function. - The name
identity
is associated with that anonymous function, making it no longer anonymous. age <- 20
similarly invokes the assignment function and as a result associates the nameage
with the length 1 numeric vector whose single item is the object20
.identity(age)
invokes the function associated with the nameidentity
and links together the current value ofage
with the new namea
. For the moment, bothage
anda
are names that refer to the same object.- That link is called a promise in R: it’s a promise to provide the value of
age
, ifa
is ever needed within theidentity
function. A promise is itself an object—everything in R is an object.
- That link is called a promise in R: it’s a promise to provide the value of
- The comments are ignored. But read the comments now. This is where the statement—names have objects, objects don’t have names—is important. Once you’re inside the function, your code doesn’t know what name was used to call the function or what the name (if any) associated with the input value (if any) was. When you want to provide a title for graphics or filenames, that’s a pain. (NB: there is a way you can find out the name of the object that was passed in using the
rlang
package, but that’s beyond the scope of this post.) - Finally we reach the line that has just an
a
on it. In R, the last object evaluated within a function is the returned value of that function. So, now thata
is needed, the promise to associate the value ofage
with the namea
is realized. The namea
now is associated with the very same length 1 numeric vector whose single element is the numeric value20
. - That vector is returned as the final value of the function because we explicitly evaluate
a
as the final instruction within the function. - Now, attention turns to the last line again: the call to
identity(age)
has been fully evaluated; the function has been called and its return value determined. - R’s interactive console, by default, calls the
print
function on any value left as a result of evaluating an expression. For example, when you type the number1234
and press enter; R determines that1234
needs no further evaluation and thus prints1234
as the final output. - Since the result of calling the identity function was the length 1 numeric vector whose single element is the numeric value
20
, R callsprint
on that vector and displays a reasonable representation of that object:[1] 20
You can see why we associate intention-revealing names with objects: constantly having to say “the length 1 numeric vector whose single element is the numeric value 20” would be impractical. age <- 20
, let’s talk about age
, instead.
The Truth Shall Set You Free
To summarize:
- Everything in R is an object
- Everything that happens in R is the result of a function call
- Names have objects, objects don’t have names
Many of the mysterious behaviors of R can be traced to these truths. Wield them well.
Left to the Reader as an Exercise
What do you think the result(s?) of the following code would be?
process_inputs <- function(item1, item2, f) { f(item1, item2) } process_inputs(1, 2, sum) process_inputs(1, 2, `/`) # those are backticks found on the tilde key ~ `/`