Writing Pipe-friendly Functions

Pipes have been a fundamental aspect of computer programming for many decades. In short, the semantics of pipes can be thought of as taking the output from the left-hand side and passing it as input to the right-hand side. For example, in a linux shell, you might cat example.txt | sort | uniq to take the contents of a text file, then sort the rows, then take one copy of each distinct value. | is a common, but not universal, pipe operator and on U.S. Qwerty keyboards, is found above the RETURN key along with the backslash: \.

Languages that don’t begin by supporting pipes often eventually implement some version of them. In R, the magrittr package introduced the %>% infix operator as a pipe operator and is most often pronounced as “then”. For example, “take the mtcars data.frame, THEN take the head of it, THEN…” and so on.

For a function to be pipe friendly, it should at least take a data object (often named .data) as its first argument and return an object of the same type—possibly even the same, unaltered object. This contract ensures that your pipe-friendly function can exist in the middle of a piped workflow, accepting the input from its left-hand side and passing along output to its right-hand side.

library(magrittr)

custom_function <-
  function(.data) {
    message(str(.data))

    .data
  }

mtcars %>%
  custom_function() %>%
  head(10) %>%
  custom_function()

This will first display the structure of the 32 by 10 mtcars data.frame, then take the head(10) of mtcars and display the structure of that 10 by 10 reduced version, ultimately returning the reduced version which is, by default in R, printed to the console.

The dplyr package in R introduces the notion of a grouped data.frame. For example, in the mtcars data, there is a cyl parameter that classifies each observation as a 4, 6, or 8 cylinder vehicle. You might want to process each of these groups of rows separately—i.e., process all the 4 cylinder vehicles together, then all the 6 cylinder, then all the 8 cylinder:

library(dplyr)

mtcars %>%
  group_by(cyl) %>%
  tally()

Note that dplyr re-exports the magrittr pipe operator, so it’s not necessary to attach both dplyr and magrittr explicitly; attaching dplyr will usually suffice.

In order to make my custom function group-aware, I need to check the incoming .data object to see whether it’s a grouped data.frame. If it is, then I can use dplyr‘s do() function to call my custom function on each subset of the data. Here, the (.) notation denotes the subset of .data being handed to custom_function at each invocation.

library(dplyr)

custom_function <-
  function(.data) {
    if (dplyr::is_grouped_df(.data)) {
      return(dplyr::do(.data, custom_function(.)))
    }

    message(str(.data))

    .data
  }

mtcars %>%
  custom_function() 

mtcars %>%
  group_by(cyl) %>%
  custom_function()

In these examples, I’ve messaged some metadata to the console, but your custom functions can do any work they like: create, plot, and save ggplots; compute statistics; generate log files; and so on.

I usually include the R three-dots parameter, ...,  to allow additional parameters to be passed into the function.

custom_function <- 
  function(.data, ...) { 
    if (dplyr::is_grouped_df(.data)) {
      return(dplyr::do(.data, custom_function(., ...))) 
    } 

    message(str(.data)) 

    .data 
  }

One thought on “Writing Pipe-friendly Functions”

  1. WOW! I just gained 5 R IQ points reading this. This simple how-to guide for pipe-friendly functions reveals problems I was about to encounter, and shows how to avoid them. Great topic, great examples. Thanks.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.