Thinking about data-driven visualizations

When you’re creating a visualization based on data, it often seems as if the possibilities are endless. Realistically, however, your best option is to think carefully about each of the variables with which you’re working—typically represented as the columns in a spreadsheet—and the limited number of aesthetic dimensions of your visualization—for each data point: the x position, the y position, possibly the z position, color, transparency, shape, and size.

Your goal is to map each aesthetic to one variable. If you’re using an aesthetic dimension in your graphic that isn’t tied to your variables, then why do you have that dimension? After all, it’s not communicating any information.

Leland Wilkinson’s seminal book, A Grammar of Graphics.

Hadley Wickham’s classic: A Layered Grammar of Graphics.

and a discussion that applies Hadley’s ideas.

Defensively install packages in R

Often, your R code will rely on having one or more R packages available. A little defensive coding will save users of your code—including future-you—from having to figure out which packages you’re using and then having to manually install them. This lowers the extraneous cognitive load associated with running older or unfamiliar code.

if (!"tidyverse" %in% installed.packages()) install.package("tidyverse")

Or, if you prefer to always use blocks with IF statements:

if (!"tidyverse" %in% installed.packages()) {
  install.package("tidyverse")
}

With a little persistence, you can extend this to dealing with multiple packages:

pkgs <- c("tidyverse", "openxlsx")
install.packages(pkgs[!pkgs %in% installed.packages()])

Getting started with R

Download and install R. Download and install RStudio. Read R for Data Science.

R provides the backend: the programming language specification and the interpreter.

RStudio provides the frontend: the user interface that allows you to interact with R, visualize data, and manage the files associated with your analyses.

R for Data Science introduces you to the tidyverse way of programming. There are basically methods of programming in R: “base R”, which has been around since the R language was first conceived (and before, since R is itself based on the S language), and the tidyverse, a newer approach that focuses on leveraging a consistent structure to your data and developing a grammar for data ingest, data wrangling, data visualization, and data storage.

Base R tends to be dense in meaning where the Tidyverse tends to be consistent and to breakdown complex processes into a set of discrete steps:

base R Tidyverse
mtcars[2, "cyl"] library(tidyverse)
mtcars %>%
select(cyl) %>%
slice(2)
mtcars[mtcars$cyl == 4, c("hp", "mpg")] library(tidyverse)
mtcars %>%
filter(cyl == 4) %>%
select(hp, mpg)

 

What Software Developers Get Wrong About Enterprise Users

For the past few years, I’ve been working in an enterprise computing environment that has both striking similarities and dissimilarities from the open source freelance and the academic institutional environments. I’ve been frustrated a number of times by products that either haven’t thought about their enterprise users or, perhaps, don’t care.

So, how is the enterprise different? Continue reading

What you don’t know about Microsoft Word will hurt you

Microsoft Word is more than a blank sheet of paper; it’s sophisticated software that can help to apply consistent, professional, and attractive formatting to your documents. But if you don’t learn a few key features, Word will be cruel to you and you won’t understand why. Honestly, Word may still be cruel to you… it’s that kind of software.

I’ll focus here on professional and academic writing, rather than on desktop publishing or flyers. Continue reading

Why cite your sources when writing?

All citation styles have a common purpose: to document the history of ideas. Each formal style—American Psychological Association (APA), Modern Language Association (MLA), Chicago—uses a different approach to achieve that goal driven by the history and focus of the scholarly community that produced the citation style. Chicago’s Notes and Bibliography (NB) style aims to keep in-text citations to a minimum for readability, for example, while APA style is focused on proper attribution of ideas to people in the main text itself.

Failure to follow some citation style in your writing will lead to accusations of theft of ideas—known as plagiarism—a very serious offense in communities where reputations and careers are built on the strength and originality of your ideas. Plagiarizing can prevent you from receiving an academic degree, lead to already awarded degrees being revoked, book deals being canceled, books being pulled from stores, and job loss, especially if your having been offered the job was based on a degree you received that is revoked.

Continue reading

Programming Languages are Only the Beginning

Programming languages are tools to express programmer intentions. Why, then, do we suffer the indignities of inelegant notation when we might, instead, bend the language to capture our meaning better?

If you’ve written code, you’ve likely accessed the first and last elements of an array:

var grades = [80, 90, 85];
grades[0]; // 80
grades[grades.length - 1]; // 85

How many times have you written [0]? [arr.length – 1]? Or worse, [arr.length], resulting in an off-by-1 error?

What we mean here is “the first element” and “the last element”. Unfortunately, JavaScript doesn’t provide a method on Array objects to extract the first or last elements.

> grades.first()
< TypeError: grades.first is not a function. (In 'grades.first()', 'grades.first' is undefined)

So let’s update the language to clarify that meaning. JavaScript is a prototypal language: There is an Array prototype which all instances of arrays are based on. By adding methods to the Array prototype, we immediately add those methods to every instance of an array.

Array.prototype.first = function() { return(this[0]); }
Array.prototype.last = function() { return(this[ this.length - 1 ]); }

Now, we can easily and without fear of off-by-1 errors access the first and last elements:

> grades.first()
< 80
> grades.last()
< 85

But let’s not stop there… what other functions might it be useful to have? How would you enhance the language to provide those functions?

Any function you write provides an opportunity to make your intentions clearer and to create a domain specific language that allows you to express solutions to problems that interest you more naturally. Use it to your advantage.

FizzBuzz in JavaScript

Functions are first class objects. Functions establish closures.

Problem: Given a range of positive, non-zero integers, output “Fizz” if the number is evenly divisible by 3, output “Buzz” is the number is evenly divisible by 5, and output “FizzBuzz” if the number is evenly divisible by both 3 and 5; otherwise, output the number.

divisor = function(number, string) {
  return(function(d) {
    if (d % number === 0) {return(string)} else {return("")};
  });
}

mod3er = divisor(3, "Fizz");
mod5er = divisor(5, "Buzz");

for(i = 1; i <= 100; i = i + 1) {
    res = mod3er(i) + mod5er(i);
    console.log(res === "" ? i : res);
}

FizzBuzz in R

Functions are first class objects in R. Functions establish closures also known in R as environments. So, you can use functions to create other functions in creative ways.

Here, I’ve written a function called divisor that returns a function that checks whether a given input, d, is evenly divisible by number and if so, returns string. Then I use divisor to create a test for divisibility by 3 and another for divisibility by 5.

Problem: Given a range of positive, non-zero integers, output “Fizz” if the number is evenly divisible by 3, output “Buzz” if the number is evenly divisible by 5, and output “FizzBuzz” if the number is evenly divisible by both 3 and 5; otherwise, output the number.

Solution:

divisor <-
  function(number, string) {
    function(d) {
      if (d %% number == 0) string else ""
    }
  }

mod3er <- divisor(3, "Fizz")
mod5er <- divisor(5, "Buzz")

fizzbuzz <- 
  function(i) {
    res <- paste0(mod3er(i), mod5er(i))
    ifelse(res == "", i, res)
  }

sapply(1:100, fizzbuzz)

What Students Say

A note to myself

I believe that I can be a better educator through reflection and active engagement. I believe that I can better serve my students and colleagues by being honest with them. I believe that reflection, engagement, and honesty can help other educators improve their praxis, should they feel so inclined.

It has always been about the students

A note to students

Continue reading

Visons of Science

I’m supporting a friend with a great idea that’s a little less than 12 hours old….

My friend and fellow computer science education researcher, Brian Danielak, has worked hard today to create what we hope will be the first of many video podcasts to promote high quality visualizations in science.

He and his team would like feedback ASAP on their initial effort.

If you have ~25 minutes tonight (or as soon as you can) watch his ‘cast and provide feedback via the form underneath the video…

http://briandk.com/2013/07/visions-of-science/

Why are Teachers Leaving Teaching?

The Washington Post– and many other outlets– recently reported on the resignation letter of Gerald J. Conti, a social studies teacher at Westhill High School, Syracuse, New York. Mr. Conti has 40 years of teaching experience, but feels that teaching has been marginalized in the increasingly aggressive drive for standardization of curricula, instruction, and assessment.

With regard to my profession, I have truly attempted to live John Dewey’s famous quotation (now likely cliché with me, I’ve used it so very often) that “Education is not preparation for life, education is life itself.” This type of total immersion is what I have always referred to as teaching “heavy,” working hard, spending time, researching, attending to details and never feeling satisfied that I knew enough on any topic. I now find that this approach to my profession is not only devalued, but denigrated and perhaps, in some quarters despised. STEM rules the day and “data driven” education seeks only conformity, standardization, testing and a zombie-like adherence to the shallow and generic Common Core, along with a lockstep of oversimplified so-called Essential Learnings.
Gerald J. Conti