I like this recent GOTO conference talk about the role of linguistics in understanding the language of coding. It touches upon many issues I’ve noted over the years as well as newer-to-me issues in non-English programming.
Category Archives: Programming
issuer: Local issue tracking, no net required
The goal of issuer is to provide a simple issue tracker, hosted on your local file system, for those users who don’t want to or are disallowed from using cloud-based code repositories.
Online code repositories often provide an issue tracker to allow developers, reviewers, and users to report bugs, submit feature requests, and so on. However, many developers either choose to work offline or work on enterprise networks where use of cloud services may be prohibited.
issuer is an Add-in for use in RStudio’s desktop IDE. It works entirely locally with no requirement for a cloud service or even a network connection.
Read more about issuer at https://github.com/WilDoane/issuer
You can install the development version of issuer from Github with:
devtools::install_github("WilDoane/issuer")
Clean, Consistent Column Names
I like to standardize the column names of data I’m reading into R so that I don’t have to match column names from one dataset that has an i.d.
column and another that has an id
column or maybe an ID
column. Keep it simple: lower case with a single underscore separator between words.
Here is Your Data
It’s a common situation: you want to code and debug in R *and* leverage RMarkdown for a presentation or document.
The challenge: file paths.
Executing code in the console and from within a saved RMarkdown document typically requires distinct file paths to locate data files.
Continue reading Here is Your DataMy RStudio Configuration
I help a few of dozen users install RStudio and learn R regularly. Whenever I need to install RStudio on a new machine, I have to think a bit about the configuration options I’ve tweaked. Invariably, I miss a checkbox that leaves me with slightly different RStudio behavior on each system. This post includes screenshots of my currently preferred standard RStudio configuration and custom keyboard shortcuts for RStudio 1.3, MacOS.
If you need an exact copy of your settings, consider the discussion at https://stackoverflow.com/questions/55903423/export-import-rstudio-user-preferences-global-setting-etc/55940249 (h/t: liebrr)
Continue reading My RStudio ConfigurationConverting Individual Binary Vectors to a Value Based on Column Names
When processing data downloaded from popular survey engines, it’s not uncommon for multiple choice questions to be represented as one column per possible response coded as 0/1. So, a question with just two responses might be downloaded as part of a CSV with one column for q1_1 and another for q1_2. If the responses are mutually exclusive, then (q1_1 == 0 iff q1_2 == 1) and (q1_1 == 1 iff q1_2 == 0). If the responses are part of a “choose all that apply” question, then it’s possible to have multiple 1s.
How can these individual binary indicator variables be reassembled into a single response variable?
Continue reading Converting Individual Binary Vectors to a Value Based on Column NamesCommon Uncommon Notations that Confuse New R Coders
Here are a few of the more commonly used notations found in R code and documentation that confuse coders of any skill level who are new to R.
Continue reading Common Uncommon Notations that Confuse New R Coders
The Shiny Module Design Pattern
Foremost in your mind should be the quintessential reality of R: Everything that happens in R is the result of a function call. Shiny is no exception.
To write a minimal shiny app, you create an object that describes your app’s user interface, write a function describing runtime behaviors, and pass them into a function that spins up the app you’ve described. By convention, these two objects are associated with the variable names ui and server.
library(shiny)
ui <- fluidPage()
server <- function(input, output, session) {}
This is just R code. You can type it into the Console to execute it line by line and inspect what it does.
If you’re working in RStudio, you can type it into a Source file, then press Control-Enter (Windows) or Command-Return (MacOS) to send each line to the Console for execution.
Checking the Environment—or the structure of these two objects with str()—we can see that ui is a list of three objects. If we print ui to the Console, we see only an empty HTML <div> element.
<div class="container-fluid"></div>
The object associated with server is simply a function with no body.
To execute this minimal shiny app, we pass the ui and server objects to the shinyApp() function.
shinyApp(ui, server)
The app will be spun up either in RStudio’s Viewer pane, in a Viewer window, or in your default Web browser, depending on your settings in RStudio.
Don’t be surprised: it will be just a blank window, since all that has been defined thus far is an empty <div> element. The document that opened is an HTML document with some boilerplate CSS and JavaScript. You can inspect it using your Browser’s Developer Tools.
That’s it. That’s shiny. Everything else flows from these core ideas:
- ui is a list object representing the HTML UI to be constructed.
- server is a function describing the runtime behavior of your app.
- shinyApp() takes these two objects and uses them to construct an HTML document that then gets spun up in a browser.
Writing Pipe-friendly Functions
Pipes have been a fundamental aspect of computer programming for many decades. In short, the semantics of pipes can be thought of as taking the output from the left-hand side and passing it as input to the right-hand side. For example, in a linux shell, you might cat example.txt | sort | uniq
to take the contents of a text file, then sort the rows, then take one copy of each distinct value. |
is a common, but not universal, pipe operator and on U.S. Qwerty keyboards, is found above the RETURN key along with the backslash: \
.
Languages that don’t begin by supporting pipes often eventually implement some version of them. In R, the magrittr package introduced the %>%
infix operator as a pipe operator and is most often pronounced as “then”. For example, “take the mtcars
data.frame, THEN take the head
of it, THEN…” and so on.
Three Deep Truths About R
- Everything that exists in R is an object ~ John M. Chambers
- Everything that happens in R is the result of a function call ~ John M. Chambers
- Names have objects; objects don’t have names ~ Hadley Wickham
So, what are the implications of these statements?
step 0: assume a malicious universe
Here’s a thought puzzle for you… given the following line of computer code, “what could go wrong?” That is, what kinds of issues could arise from submitting that code to your favorite programming language interpreter (you do have a favorite… right?)
n + 4
I’m ‘not in’ right now…
Checking whether an item is in a vector or not in a vector is a common task. The notation in R is a little inelegant when expressing the “not in” condition since the negation operator (!
) is separated from the comparison operator (%in%
):
5 %in% c(1, 2, 3, 4, 5) # TRUE !5 %in% c(1, 2, 3, 4, 5) # FALSE
R is a language where you can easily extend the set of built in operators:
`%!in%` <- function(needle, haystack) { !(needle %in% haystack) }
Now, I can express my intentions reasonably clearly with my new, compact, infix operator %!in%
:
5 %in% c(1, 2, 3, 4, 5) # TRUE 5 %!in% c(1, 2, 3, 4, 5) # FALSE
Moral: bend your tools to your will, not the other way ’round.
Defensively install packages in R
Often, your R code will rely on having one or more R packages available. A little defensive coding will save users of your code—including future-you—from having to figure out which packages you’re using and then having to manually install them. This lowers the extraneous cognitive load associated with running older or unfamiliar code.
if (!"tidyverse" %in% installed.packages()) install.package("tidyverse", deps = TRUE)
Or, if you prefer to always use blocks with IF statements:
if (!"tidyverse" %in% installed.packages()) { install.package("tidyverse", deps = TRUE) }
With a little persistence, you can extend this to dealing with multiple packages:
pkgs <- c("tidyverse", "openxlsx") install.packages(pkgs[!pkgs %in% installed.packages()], deps = TRUE)
Getting started with R
Download and install R. Download and install RStudio. Read R for Data Science.
R provides the backend: the programming language specification and the interpreter.
RStudio provides the frontend: the user interface that allows you to interact with R, visualize data, and manage the files associated with your analyses.
R for Data Science introduces you to the tidyverse way of programming. There are basically methods of programming in R: “base R”, which has been around since the R language was first conceived (and before, since R is itself based on the S language), and the tidyverse, a newer approach that focuses on leveraging a consistent structure to your data and developing a grammar for data ingest, data wrangling, data visualization, and data storage.
Base R tends to be dense in meaning where the Tidyverse tends to be consistent and to breakdown complex processes into a set of discrete steps:
base R | Tidyverse |
mtcars[2, "cyl"] |
library(tidyverse) mtcars %>% select(cyl) %>% slice(2) |
mtcars[mtcars$cyl == 4, c("hp", "mpg")] |
library(tidyverse) mtcars %>% filter(cyl == 4) %>% select(hp, mpg) |
Programming Languages are Only the Beginning
Programming languages are tools to express programmer intentions. Why, then, do we suffer the indignities of inelegant notation when we might, instead, bend the language to capture our meaning better?
If you’ve written code, you’ve likely accessed the first and last elements of an array:
var grades = [80, 90, 85]; grades[0]; // 80 grades[grades.length - 1]; // 85
How many times have you written [0]
? [arr.length - 1]
? Or worse, [arr.length]
, resulting in an off-by-1 error?
What we mean here is “the first element” and “the last element”. Unfortunately, JavaScript doesn’t provide a method on Array objects to extract the first or last elements.
> grades.first() < TypeError: grades.first is not a function. (In 'grades.first()', 'grades.first' is undefined)
So let’s update the language to clarify that meaning. JavaScript is a prototypal language: There is an Array prototype which all instances of arrays are based on. By adding methods to the Array prototype, we immediately add those methods to every instance of an array.
Array.prototype.first = function() { return(this[0]); } Array.prototype.last = function() { return(this[ this.length - 1 ]); }
Now, we can easily and without fear of off-by-1 errors access the first and last elements:
> grades.first() < 80 > grades.last() < 85
But let’s not stop there… what other functions might it be useful to have? How would you enhance the language to provide those functions?
Any function you write provides an opportunity to make your intentions clearer and to create a domain specific language that allows you to express solutions to problems that interest you more naturally. Use it to your advantage.
FizzBuzz in JavaScript
Functions are first class objects. Functions establish closures.
Problem: Given a range of positive, non-zero integers, output “Fizz” if the number is evenly divisible by 3, output “Buzz” is the number is evenly divisible by 5, and output “FizzBuzz” if the number is evenly divisible by both 3 and 5; otherwise, output the number.
divisor = function(number, string) { return(function(d) { if (d % number === 0) {return(string)} else {return("")}; }); } mod3er = divisor(3, "Fizz"); mod5er = divisor(5, "Buzz"); for(i = 1; i <= 100; i = i + 1) { res = mod3er(i) + mod5er(i); console.log(res === "" ? i : res); }
FizzBuzz in R
Functions are first class objects in R. Functions establish closures also known in R as environments. So, you can use functions to create other functions in creative ways.
Here, I’ve written a function called divisor
that returns a function that checks whether a given input, d
, is evenly divisible by number
and if so, returns string
. Then I use divisor
to create a test for divisibility by 3 and another for divisibility by 5.
Problem: Given a range of positive, non-zero integers, output “Fizz” if the number is evenly divisible by 3, output “Buzz” if the number is evenly divisible by 5, and output “FizzBuzz” if the number is evenly divisible by both 3 and 5; otherwise, output the number.
Solution:
divisor <- function(number, string) { function(d) { if (d %% number == 0) string else "" } } mod3er <- divisor(3, "Fizz") mod5er <- divisor(5, "Buzz") fizzbuzz <- function(i) { res <- paste0(mod3er(i), mod5er(i)) ifelse(res == "", i, res) } sapply(1:100, fizzbuzz)
Mike Monteiro @ WebStock ’13: How Designers Destroyed the World
Mike offers some blunt and intense advice about maintaining absolute integrity in one’s work. While he’s addressing his concerns to designers, I take his advice to apply equally well to computer programmers, UX, UI, teachers… any profession where you’re creating… and really, shouldn’t that be all professions?
Bret Victor Speaking on Inventing on Principle
Bret offers some interesting insights into the importance of immediate, direct feedback while learning to program—really, while programming at all in his CUSEC talk from early 2012.
Automatically Generating an HTML5-style Cache Manifest from the Command Line
HTML5 introduces the ability to cache content client-side so that often-used resources can be used without re-downloading them. This also enables a site to be viewed from the client when no network connection is available (i.e., offline viewing of the site).
In order for this to work, there are a few things one must do:
- Create a plain text file listing all of the resources that should be cached by the user agent (e.g., a web browser)– the cache manifest.
- Refer to that file in the opening html tag of every page that will use cached resources.
- Configure the web server so that the file is sent to the user agent with a specific MIME type: text/cache-manifest
- Regenerate the cache manifest any time you change the files in your site.
Continue reading Automatically Generating an HTML5-style Cache Manifest from the Command Line