Thinking about data-driven visualizations

2018 February 24 William Doane

When you’re creating a visualization based on data, it often seems as if the possibilities are endless. Realistically, however, your best option is to think carefully about each of the variables with which you’re working—typically represented as the columns in a spreadsheet—and the limited number of aesthetic dimensions of your visualization—for each data point: the x position, the y position, possibly the z position, color, transparency, shape, and size.

Your goal is to map each aesthetic to one variable. If you’re using an aesthetic dimension in your graphic that isn’t tied to your variables, then why do you have that dimension? After all, it’s not communicating any information.

Leland Wilkinson’s seminal book, A Grammar of Graphics.

Hadley Wickham’s classic: A Layered Grammar of Graphics.

and a discussion that applies Hadley’s ideas.

Programming, R

Defensively install packages in R

2018 February 19 William Doane

Often, your R code will rely on having one or more R packages available. A little defensive coding will save users of your code—including future-you—from having to figure out which packages you’re using and then having to manually install them. This lowers the extraneous cognitive load associated with running older or unfamiliar code.

if (!"tidyverse" %in% rownames(installed.packages())) install.package("tidyverse", dep = TRUE)

Or, if you prefer to always use blocks with IF statements:

if (!"tidyverse" %in% installed.packages()) {
  install.package("tidyverse", dep = TRUE)
}

With a little persistence, you can extend this to dealing with multiple packages:

pkgs <- c("tidyverse", "openxlsx")
install.packages(pkgs[!pkgs %in% rownames(installed.packages())], dep = TRUE)

Programming, R

Getting started with R

2018 February 19 William Doane

Download and install R. Download and install RStudio. Read R for Data Science.

R provides the backend: the programming language specification and the interpreter.

RStudio provides the frontend: the user interface that allows you to interact with R, visualize data, and manage the files associated with your analyses.

R for Data Science introduces you to the tidyverse way of programming. There are basically methods of programming in R: “base R”, which has been around since the R language was first conceived (and before, since R is itself based on the S language), and the tidyverse, a newer approach that focuses on leveraging a consistent structure to your data and developing a grammar for data ingest, data wrangling, data visualization, and data storage.

Base R tends to be dense in meaning where the Tidyverse tends to be consistent and to breakdown complex processes into a set of discrete steps:

base R	Tidyverse
mtcars[2, "cyl"]	library(tidyverse) mtcars %>% select(cyl) %>% slice(2)
mtcars[mtcars$cyl == 4, c("hp", "mpg")]	library(tidyverse) mtcars %>% filter(cyl == 4) %>% select(hp, mpg)

Tips & Best Practices, Writing

Writing Using Chicago Notes and Bibliography (NB) Style

2018 February 03 William Doane

I’ve written a separate post explainingÂ why you should cite your sources. This post will focus onÂ how to cite assuming you’re using the Chicago Notes and Bibliography citation style. Continue reading Writing Using Chicago Notes and Bibliography (NB) Style →

Critiques & Solutions, Tips & Best Practices

What Software Developers Get Wrong About Enterprise Users

2018 February 01 William Doane

For the past few years, I’ve been working in an enterprise computing environment that has both striking similarities and dissimilarities from the open source freelance and the academic institutional environments. I’ve been frustrated a number of times by products that either haven’t thought about their enterprise users or, perhaps, don’t care.

So, how is the enterprise different? Continue reading What Software Developers Get Wrong About Enterprise Users →

William Doane

Monthly Archives: February 2018

Thinking about data-driven visualizations

Defensively install packages in R

Getting started with R

Writing Using Chicago Notes and Bibliography (NB) Style

What Software Developers Get Wrong About Enterprise Users

What will you improve today?