9/4/2011 Update: granovaGG is now available directly from CRAN.
Just over one year ago, I wrote about creating Dependent Sample Assessment Plots (DSAP) Using granova and R. Since then, Brian Danielak has been developing a new, ggplot2based version of granova named granovaGG, which is almost ready for release on CRAN. This article updates my earlier granovabased version, but leaves much of the article text unchanged.
DSAPs constitute a way of visualizing data in the context of two dependent sample analyses. One (of at least four ways^{1}) to think about this would be to think of preintervention and postintervention response data scores, when studying the effects of intervention.
Suppose you’re an educator and you administer an assessment to students at the beginning of a unit asking about their level of confidence or understanding of a topic. You then teach a lesson that spans some period of time. At the end you collect responses to the same questions again. You now have a dependent sample: two responses that related to the same individual for some number of individuals.
Pre  Post  
Adam  22  45 
Beth  33  30 
Cindy  35  53 
David  32  55 
Elisabeth  27  40 
For such a small sample, you can quickly eyeball the raw data and see that there seems to have been an upward shift in scores, but is it significant (in the statistical sense)? Could you so easily eyeball the results for a class of 20, 30, or 100 students? Probably not.
Data visualization is an attempt to reveal patterns in data by converting it from raw numbers to graphic images where we can more easily discern clusters (small groups of students who exhibit similar score patterns), outliers (unusually high or low scores), and effects of treatments (did the instruction result in learning?).
An assessment plot is simply a specialized scatter plot showing the pairs of values as (x, y) coordinates. When we enhance the scatter plot a little, we can gain quick insight into patterns in our data. Consider the following Dependent Sample Assessment Plots:
The plot has several features worth mentioning:
 The xaxis and yaxis use the same range (they’re on the same scale), so the plot is square.
 I’ve plotted postassessment scores along the xaxis so that the mean difference will be positive for increases in postscores and negative for decreases in postscores.
 The solid black line running from the lowerleft to the upperright represents x and y values that are the same (10, 10), (20, 20), and so on; this is called the identity line. Therefore, if there was no change between the pre and postassessment, we would expect the points to appear along this 45 degree line.
 Any points below this line represents a positive change (scores increased from the pre to postassessment).
 Any points above this line represents a negative change (scores decreased from the pre to postassessment).
 The horizontal, thinlydashed line represents the preassessment mean; here, about 29.
 The vertical, thinlydashed line represents the postassessment mean; here, about 44.
 The thick, dashed line running diagonally is the mean of the difference between pre and postassessment scores (the difference mean); here, 14.8, i.e., postassessment scores were 14.8 points higher than preassessment scores, on average.
 The green bar indicates the 95% confidence interval: the range of values for the population mean difference that are reasonable, in light of these data.
 If the green bar overlaps the identity line, then any observed difference is not statistically significant.
 Conversely, if the green bar does not overlap the identity line, then any observed difference is statistically significant. (It’s up to the analyst to decide whether it’s of practical significance!)
This simple visualization offers us much information quickly and scales well to samples of moderate class sizes. (See the 40student example, below.)
Free software is available to help us generate these graphs with just a little effort on our part. R, a statistical programming environment, is available for download for Windows, Macintosh, or Linux operating systems and offers a wealth of data management, analysis, and visualization tools. Here I’ll focus on only one such tool: granovaGG.
GranovaGG is an abbreviation for Graphical Analysis of Variance – ggplot2 and is a package available (also for free) for use in R written by Brian Danielak and myself with inspiration and guidance from Bob Pruzek and Jim Helmreich. In fact, the above plot was generated using granovaGG.
In a week or so, granovaGG will be available in CRAN, but in the meantime, in order to install and use R and the latest development release of granovaGG to produce this plot, you would
 Download R
 Install R per your operating system’s usual process
 Launch R
 Type the following commands within R

install.packages(pkgs="devtools", dependencies=TRUE)

library("devtools")


install_github(repo="granovaGG", username="briandk", branch="dev")

library(granovaGG)


x < cbind(post=c(45, 30, 53, 55, 40), pre=c(22, 33, 35, 32, 27))

granovagg.ds(x)
In the future, to run your own analysis using your own pre and postassessment data, you would simply
 Launch R
 Type the following commands within R

library(granovaGG)

x < cbind(post=c(45, 30, 53, 55, 40), pre=c(22, 33, 35, 32, 27))

granovagg.ds(x)
replacing the numbers on the line beginning with
x <
with your own pre and postassessment scores. It's important that the two lists of numbers are in matched order. That is, 22 and 45 are scores for the first student, 33 and 30 are scores for the second student, and so on. Also, notice that I've entered postassessment scores, first, so that the postscores will appear on the xaxis.
In closing, consider the data below: 40 pairs of scores in both raw numeric and granovagg format. Can you eyeball any trends from the raw data? What about based on the plot?
 See Pruzek and Helmreich's paper in the Journal of Statistics Education Volume 17, Number 1 (2009), Enhancing Dependent Sample Analyses Using Graphics ↩