Here is Your Data

It’s a common situation: you want to code and debug in R *and* leverage RMarkdown for a presentation or document.

The challenge: file paths.

Executing code in the console and from within a saved RMarkdown document typically requires distinct file paths to locate data files.

While you’re writing your code and debugging, you’ve probably got your source code open and are sending lines of code to the console to be evaluated. The code is evaluated with respect to the current working directory. In an RStudio project, that will be the top directory level of your project.

In a typical project, you’ll have several subdirectories:

- project
    - data
    - data-raw
    - output
    - R

So, a casual reference to the data directory might look like this:

read.csv("data/mtcars.csv")    # to execute in the console

If you write an RMarkdown document, you’ll save that in your R directory. While you’re debugging code, you’ll be sending lines of code to the console for evaluation, as above. So, the project’s top level directory is used as a starting point.

However, when you knit the document (or spin, or compose notebook… any of the aliases to knitr::knit() and associated functions), the document is evaluated in its own environment. From the perspective of knitr, the base directory is the directory in which you saved the .R, .md, or .Rmd document. Assuming you saved your RMarkdown script in R/, then knitting that document with the same code as above

read.csv("data/mtcars.csv")

looks for R/data/mtcars.csv within your project directory, which doesn’t exist.

The common, but not optimal, solution is to tell the knitted document to look up one directory level before looking for the data directory:

read.csv("../data/mtcars.csv") # to knit

.. is geek-speak for “parent directory” AKA “up one directory level”. If you needed to look up two levels, you would refer to ../../data/mtcars.csv

That’s great for knitting, but when you want to debug your code interactively in the console, you have to remove the ../ portion again. What a mess!

The here package addresses this issue. here provides a function, called here(), that returns a file path based on where your project’s top level directory is. So, when you evaluate the following line in the console

read.csv(here::here("data", "mtcars.csv"))

here looks for an .Rproj file or other indicators of where your current project is located and constructs a file path to that top level directory, plus a data subdirectory, plus a file named mtcars.csv in that directory:

[1] "/Users/wdoane/Documents/Repositories/example_project/data/mtcars.csv"

Equivalently, you could first library() or require() here:

library(here)
 
dir.create(here("data"))
write.csv(mtcars, here("data", "mtcars.csv"))
read.csv(here("data", "mtcars.csv"))

To install the current release version of the here package

install.packages("here")

or to install the development version

devtools::install_github("r-lib/here")

For more information about here, visit https://github.com/jennybc/here_here