Writing Pipe-friendly Functions

Pipes have been a fundamental aspect of computer programming for many decades. In short, the semantics of pipes can be thought of as taking the output from the left-hand side and passing it as input to the right-hand side. For example, in a linux shell, you might cat example.txt | unique | sort to take the contents of a text file, then take one copy of each row, then sort those remaining rows. | is a common, but not universal, pipe operator and on U.S. Qwerty keyboards, is found above the backslash key: \.

Languages that don’t begin by supporting pipes often eventually implement some version of them. In R, the magrittr package introduced the %>% infix operator as a pipe operator and is most often pronounced as “then”. For example, “take the mtcars data.frame, THEN take the head of it, THEN…” and so on.

Continue reading Writing Pipe-friendly Functions

Science and Communication: Alan Alda in Conversation with Neil deGrasse Tyson

Despite the reality that we use tools and techniques every moment of every day that have been devised and revised through the constant questioning and reflecting process we call science, far too many people don’t believe they understand what science is, don’t consider themselves scientists, and don’t trust the expert opinions of the scientific community. How can that possibly be?

“We’re not really listening, unless we’re willing to be changed by the other person.” ~ Alan Alda

Science and Communication—Alan Alda and Neil deGrasse Tyson at the 92nd Street Y in New York City

Stay Secure

Security is a tricky affair: it’s difficult to establish, to maintain, and to verify. Take all the steps you can to keep your data and devices secure.

  • Backup, keeping a copy in your home and somewhere outside your home that you consider safe and are likely to be able to access in an emergency.
  • Secure, so that only you and those you trust can get access to your information and accounts.
  • Plan for routine recovery, such as when you’re away from your computer, but need to gain access to your information and accounts.
  • Plan for extreme recovery, such as when you are disabled and unable to communicate, or when you pass away.

Continue reading Stay Secure

I’m ‘not in’ right now…

Checking whether an item is in a vector or not in a vector is a common task. The notation in R is a little inelegant when expressing the “not in” condition since the negation operator (!) is separated from the comparison operator (%in%):

5 %in% c(1, 2, 3, 4, 5)  # TRUE
!5 %in% c(1, 2, 3, 4, 5) # FALSE

R is a language where you can easily extend the set of built in operators:

`%!in%` <-
  function(needle, haystack) {
    !(needle %in% haystack)
  }

Now, I can express my intentions reasonably clearly with my new, compact, infix operator %!in%:

5 %in% c(1, 2, 3, 4, 5)  # TRUE
5 %!in% c(1, 2, 3, 4, 5) # FALSE

Moral: bend your tools to your will, not the other way ’round.

Thinking about data-driven visualizations

When you’re creating a visualization based on data, it often seems as if the possibilities are endless. Realistically, however, your best option is to think carefully about each of the variables with which you’re working—typically represented as the columns in a spreadsheet—and the limited number of aesthetic dimensions of your visualization—for each data point: the x position, the y position, possibly the z position, color, transparency, shape, and size.

Your goal is to map each aesthetic to one variable. If you’re using an aesthetic dimension in your graphic that isn’t tied to your variables, then why do you have that dimension? After all, it’s not communicating any information.

Leland Wilkinson’s seminal book, A Grammar of Graphics.

Hadley Wickham’s classic: A Layered Grammar of Graphics.

and a discussion that applies Hadley’s ideas.

Defensively install packages in R

Often, your R code will rely on having one or more R packages available. A little defensive coding will save users of your code—including future-you—from having to figure out which packages you’re using and then having to manually install them. This lowers the extraneous cognitive load associated with running older or unfamiliar code.

if (!"tidyverse" %in% installed.packages()) install.package("tidyverse")

Or, if you prefer to always use blocks with IF statements:

if (!"tidyverse" %in% installed.packages()) {
  install.package("tidyverse")
}

With a little persistence, you can extend this to dealing with multiple packages:

pkgs <- c("tidyverse", "openxlsx")
install.packages(pkgs[!pkgs %in% installed.packages()])

Getting started with R

Download and install R. Download and install RStudio. Read R for Data Science.

R provides the backend: the programming language specification and the interpreter.

RStudio provides the frontend: the user interface that allows you to interact with R, visualize data, and manage the files associated with your analyses.

R for Data Science introduces you to the tidyverse way of programming. There are basically methods of programming in R: “base R”, which has been around since the R language was first conceived (and before, since R is itself based on the S language), and the tidyverse, a newer approach that focuses on leveraging a consistent structure to your data and developing a grammar for data ingest, data wrangling, data visualization, and data storage.

Base R tends to be dense in meaning where the Tidyverse tends to be consistent and to breakdown complex processes into a set of discrete steps:

base R Tidyverse
mtcars[2, "cyl"]
library(tidyverse)
mtcars %>%
  select(cyl) %>%
  slice(2)
mtcars[mtcars$cyl == 4, c("hp", "mpg")]
library(tidyverse)
mtcars %>%
  filter(cyl == 4) %>%
  select(hp, mpg)

 

What Software Developers Get Wrong About Enterprise Users

For the past few years, I’ve been working in an enterprise computing environment that has both striking similarities and dissimilarities from the open source freelance and the academic institutional environments. I’ve been frustrated a number of times by products that either haven’t thought about their enterprise users or, perhaps, don’t care.

So, how is the enterprise different? Continue reading What Software Developers Get Wrong About Enterprise Users

What you don’t know about Microsoft Word will hurt you

Microsoft Word is more than a blank sheet of paper; it’s sophisticated software that can help to apply consistent, professional, and attractive formatting to your documents. But if you don’t learn a few key features, Word will be cruel to you and you won’t understand why. Honestly, Word may still be cruel to you… it’s that kind of software.

I’ll focus here on professional and academic writing, rather than on desktop publishing or flyers. Continue reading What you don’t know about Microsoft Word will hurt you

Why cite your sources when writing?

All citation styles have a common purpose: to document the history of ideas. Each formal style—American Psychological Association (APA), Modern Language Association (MLA), Chicago—uses a different approach to achieve that goal driven by the history and focus of the scholarly community that produced the citation style. Chicago’s Notes and Bibliography (NB) style aims to keep in-text citations to a minimum for readability, for example, while APA style is focused on proper attribution of ideas to people in the main text itself.

Failure to follow some citation style in your writing will lead to accusations of theft of ideas—known as plagiarism—a very serious offense in communities where reputations and careers are built on the strength and originality of your ideas. Plagiarizing can prevent you from receiving an academic degree, lead to already awarded degrees being revoked, book deals being canceled, books being pulled from stores, and job loss, especially if your having been offered the job was based on a degree you received that is revoked.

Continue reading Why cite your sources when writing?

Programming Languages are Only the Beginning

Programming languages are tools to express programmer intentions. Why, then, do we suffer the indignities of inelegant notation when we might, instead, bend the language to capture our meaning better?

If you’ve written code, you’ve likely accessed the first and last elements of an array:

var grades = [80, 90, 85];
grades[0]; // 80
grades[grades.length - 1]; // 85

How many times have you written [0]? [arr.length - 1]? Or worse, [arr.length], resulting in an off-by-1 error?

What we mean here is “the first element” and “the last element”. Unfortunately, JavaScript doesn’t provide a method on Array objects to extract the first or last elements.

> grades.first()
< TypeError: grades.first is not a function. (In 'grades.first()', 'grades.first' is undefined)

So let’s update the language to clarify that meaning. JavaScript is a prototypal language: There is an Array prototype which all instances of arrays are based on. By adding methods to the Array prototype, we immediately add those methods to every instance of an array.

Array.prototype.first = function() { return(this[0]); }
Array.prototype.last = function() { return(this[ this.length - 1 ]); }

Now, we can easily and without fear of off-by-1 errors access the first and last elements:

> grades.first()
< 80
> grades.last()
< 85

But let’s not stop there… what other functions might it be useful to have? How would you enhance the language to provide those functions?

Any function you write provides an opportunity to make your intentions clearer and to create a domain specific language that allows you to express solutions to problems that interest you more naturally. Use it to your advantage.

FizzBuzz in JavaScript

Functions are first class objects. Functions establish closures.

Problem: Given a range of positive, non-zero integers, output “Fizz” if the number is evenly divisible by 3, output “Buzz” is the number is evenly divisible by 5, and output “FizzBuzz” if the number is evenly divisible by both 3 and 5; otherwise, output the number.

divisor = function(number, string) {
  return(function(d) {
    if (d % number === 0) {return(string)} else {return("")};
  });
}

mod3er = divisor(3, "Fizz");
mod5er = divisor(5, "Buzz");

for(i = 1; i <= 100; i = i + 1) {
    res = mod3er(i) + mod5er(i);
    console.log(res === "" ? i : res);
}

What will you improve today?