Reordering the factor levels in R boxplots, and making them look pretty with base graphics

Last night a colleague was rushing to meet a submission deadline and needed help changing the default ordering R uses in boxplots for one of her figures. This is pretty easy—it boils down to defining the factor levels manually, so I thought I’d show this along with a few other things I like to add to my boxplots. I think many of these additions not only make the plots look pretty/professional, but also convey more information in a visually intuitive form. My colleague’s data was in a data.frame with one column defining the factor levels, and another column containing the values, so I’ll start from there with some fake data.

# Some things I'll use later
WorkingDir<-"~/blog/boxplots/"
mywidth<-6
myheight<-6
myres<-300

# Make a table of fake data to play with. Let's have 3 groups with different numbers of observations.
num.a<-100
num.b<-20
num.c<-50
num.tot<-num.a + num.b + num.c  # We'll use the total later as well

a<-rnorm(n=num.a, mean=1, sd=1)
b<-rnorm(n=num.b, mean=2, sd=2)
c<-rnorm(n=num.c, mean=3, sd=3)

mytab<-data.frame(factor=c(rep("a", num.a), rep("b", num.b), rep("c", num.c)), value=c(a,b,c))

# Look at some random rows to get a feel for the data
mytab[sample(1:num.tot, 15),]

         factor       value
82            a  0.39031460
92            a  0.75720888
36            a  1.10241158
94            a -0.04059472
69            a  0.34451820
116           b -1.27534481
155           c  5.33233781
74            a -0.06426057
73            a  1.92099027
126           c  2.13355513
110           b -0.20789075
125           c  7.11510688
15            a -0.31202992
55            a  1.53820220
154           c  4.62420267



# Take a look at it, as is
png(filename=paste(WorkingDir, "plot_1", ".png", sep=""), units="in", width=mywidth, height=myheight, res=myres)
boxplot(mytab$value ~ mytab$factor)
dev.off()

# This is how you can control the order of what appears in the boxplot
mytab$factor<-factor(mytab$factor, levels=c("c", "b", "a"))

# Take a look at it
png(filename=paste(WorkingDir, "plot_2", ".png", sep=""), units="in", width=mywidth, height=myheight, res=myres)
boxplot(mytab$value ~ mytab$factor)
dev.off()

apple

# Add color
# Just so we don't get lost in the details of the RColorBrewer stuff below, this is how you simply add color
png(filename=paste(WorkingDir, "plot_3", ".png", sep=""), units="in", width=mywidth, height=myheight, res=myres)
boxplot(mytab$value ~ mytab$factor, col=c("red", "white", "blue"))
dev.off()

# But, it's more powerful to be able to more generally generate colors for an arbitrary number of categories
library(RColorBrewer)
NumberOfLevels<-length(levels(mytab$factor))
mycolors<-brewer.pal(n=NumberOfLevels, name="Set1")

# Make box sizes proportional to N
# First just to show how this works without regard for getting the sizes right
levelProportions<-c(.1, .3, .6)
png(filename=paste(WorkingDir, "plot_4", ".png", sep=""), units="in", width=mywidth, height=myheight, res=myres)
boxplot(mytab$value ~ mytab$factor, width=levelProportions, col=mycolors)
dev.off()

# Let's get the sizes right, based on sample size, and generalize in case the level order is changed again.
# Also note the 'outpch=NA' argument--this removes the outlier points.  In the next step I add these back.
levelProportions<-summary(mytab$factor)/num.tot
png(filename=paste(WorkingDir, "plot_5", ".png", sep=""), units="in", width=mywidth, height=myheight, res=myres)
boxplot(mytab$value ~ mytab$factor, width=levelProportions, col=mycolors, outpch=NA)


# Add data points
mylevels<-levels(mytab$factor)
for(i in 1:length(mylevels))
{
  thislevel<-mylevels[i]
  thisvalues<-mytab[mytab$factor==thislevel, "value"]
  
  # take the x-axis indices and add a jitter, proportional to the N in each level
  myjitter<-jitter(rep(i, length(thisvalues)), amount=levelProportions[i]/2)
  points(myjitter, thisvalues, pch=20, col=rgb(0,0,0,.2)) 
  
  # While we're looping, lets add some text
  # I played with this a lot for this particular plot.  I tried to make it general, but you may have to adjust a bit for different data.
  TopOfWhisker<-min(max(thisvalues), median(thisvalues)+IQR(thisvalues)*1.5)
  text(i+levelProportions[i]/2, TopOfWhisker, labels=paste("N=", length(thisvalues), sep=""), cex=.6, font=2, pos=4)
}
dev.off()

2 thoughts on “Reordering the factor levels in R boxplots, and making them look pretty with base graphics

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s