Last night a colleague was rushing to meet a submission deadline and needed help changing the default ordering R uses in boxplots for one of her figures. This is pretty easy—it boils down to defining the factor levels manually, so I thought I’d show this along with a few other things I like to add to my boxplots. I think many of these additions not only make the plots look pretty/professional, but also convey more information in a visually intuitive form. My colleague’s data was in a data.frame with one column defining the factor levels, and another column containing the values, so I’ll start from there with some fake data.

# Some things I'll use later WorkingDir<-"~/blog/boxplots/" mywidth<-6 myheight<-6 myres<-300 # Make a table of fake data to play with. Let's have 3 groups with different numbers of observations. num.a<-100 num.b<-20 num.c<-50 num.tot<-num.a + num.b + num.c # We'll use the total later as well a<-rnorm(n=num.a, mean=1, sd=1) b<-rnorm(n=num.b, mean=2, sd=2) c<-rnorm(n=num.c, mean=3, sd=3) mytab<-data.frame(factor=c(rep("a", num.a), rep("b", num.b), rep("c", num.c)), value=c(a,b,c)) # Look at some random rows to get a feel for the data mytab[sample(1:num.tot, 15),] factor value 82 a 0.39031460 92 a 0.75720888 36 a 1.10241158 94 a -0.04059472 69 a 0.34451820 116 b -1.27534481 155 c 5.33233781 74 a -0.06426057 73 a 1.92099027 126 c 2.13355513 110 b -0.20789075 125 c 7.11510688 15 a -0.31202992 55 a 1.53820220 154 c 4.62420267 # Take a look at it, as is png(filename=paste(WorkingDir, "plot_1", ".png", sep=""), units="in", width=mywidth, height=myheight, res=myres) boxplot(mytab$value ~ mytab$factor) dev.off()

# This is how you can control the order of what appears in the boxplot mytab$factor<-factor(mytab$factor, levels=c("c", "b", "a")) # Take a look at it png(filename=paste(WorkingDir, "plot_2", ".png", sep=""), units="in", width=mywidth, height=myheight, res=myres) boxplot(mytab$value ~ mytab$factor) dev.off()

# Add color # Just so we don't get lost in the details of the RColorBrewer stuff below, this is how you simply add color png(filename=paste(WorkingDir, "plot_3", ".png", sep=""), units="in", width=mywidth, height=myheight, res=myres) boxplot(mytab$value ~ mytab$factor, col=c("red", "white", "blue")) dev.off()

# But, it's more powerful to be able to more generally generate colors for an arbitrary number of categories library(RColorBrewer) NumberOfLevels<-length(levels(mytab$factor)) mycolors<-brewer.pal(n=NumberOfLevels, name="Set1") # Make box sizes proportional to N # First just to show how this works without regard for getting the sizes right levelProportions<-c(.1, .3, .6) png(filename=paste(WorkingDir, "plot_4", ".png", sep=""), units="in", width=mywidth, height=myheight, res=myres) boxplot(mytab$value ~ mytab$factor, width=levelProportions, col=mycolors) dev.off()

# Let's get the sizes right, based on sample size, and generalize in case the level order is changed again. # Also note the 'outpch=NA' argument--this removes the outlier points. In the next step I add these back. levelProportions<-summary(mytab$factor)/num.tot png(filename=paste(WorkingDir, "plot_5", ".png", sep=""), units="in", width=mywidth, height=myheight, res=myres) boxplot(mytab$value ~ mytab$factor, width=levelProportions, col=mycolors, outpch=NA) # Add data points mylevels<-levels(mytab$factor) for(i in 1:length(mylevels)) { thislevel<-mylevels[i] thisvalues<-mytab[mytab$factor==thislevel, "value"] # take the x-axis indices and add a jitter, proportional to the N in each level myjitter<-jitter(rep(i, length(thisvalues)), amount=levelProportions[i]/2) points(myjitter, thisvalues, pch=20, col=rgb(0,0,0,.2)) # While we're looping, lets add some text # I played with this a lot for this particular plot. I tried to make it general, but you may have to adjust a bit for different data. TopOfWhisker<-min(max(thisvalues), median(thisvalues)+IQR(thisvalues)*1.5) text(i+levelProportions[i]/2, TopOfWhisker, labels=paste("N=", length(thisvalues), sep=""), cex=.6, font=2, pos=4) } dev.off()

elenasuper useful and really well explained! thanks a lot!!!!

jpwendlerPost authorThanks very much Elena!