# Reordering the factor levels in R boxplots, and making them look pretty with base graphics

Last night a colleague was rushing to meet a submission deadline and needed help changing the default ordering R uses in boxplots for one of her figures. This is pretty easy—it boils down to defining the factor levels manually, so I thought I’d show this along with a few other things I like to add to my boxplots. I think many of these additions not only make the plots look pretty/professional, but also convey more information in a visually intuitive form. My colleague’s data was in a data.frame with one column defining the factor levels, and another column containing the values, so I’ll start from there with some fake data.

```# Some things I'll use later
WorkingDir<-"~/blog/boxplots/"
mywidth<-6
myheight<-6
myres<-300

# Make a table of fake data to play with. Let's have 3 groups with different numbers of observations.
num.a<-100
num.b<-20
num.c<-50
num.tot<-num.a + num.b + num.c  # We'll use the total later as well

a<-rnorm(n=num.a, mean=1, sd=1)
b<-rnorm(n=num.b, mean=2, sd=2)
c<-rnorm(n=num.c, mean=3, sd=3)

mytab<-data.frame(factor=c(rep("a", num.a), rep("b", num.b), rep("c", num.c)), value=c(a,b,c))

# Look at some random rows to get a feel for the data
mytab[sample(1:num.tot, 15),]

factor       value
82            a  0.39031460
92            a  0.75720888
36            a  1.10241158
94            a -0.04059472
69            a  0.34451820
116           b -1.27534481
155           c  5.33233781
74            a -0.06426057
73            a  1.92099027
126           c  2.13355513
110           b -0.20789075
125           c  7.11510688
15            a -0.31202992
55            a  1.53820220
154           c  4.62420267

# Take a look at it, as is
png(filename=paste(WorkingDir, "plot_1", ".png", sep=""), units="in", width=mywidth, height=myheight, res=myres)
boxplot(mytab\$value ~ mytab\$factor)
dev.off()
``` ```# This is how you can control the order of what appears in the boxplot
mytab\$factor<-factor(mytab\$factor, levels=c("c", "b", "a"))

# Take a look at it
png(filename=paste(WorkingDir, "plot_2", ".png", sep=""), units="in", width=mywidth, height=myheight, res=myres)
boxplot(mytab\$value ~ mytab\$factor)
dev.off()
``` ```# Add color
# Just so we don't get lost in the details of the RColorBrewer stuff below, this is how you simply add color
png(filename=paste(WorkingDir, "plot_3", ".png", sep=""), units="in", width=mywidth, height=myheight, res=myres)
boxplot(mytab\$value ~ mytab\$factor, col=c("red", "white", "blue"))
dev.off()
``` ```# But, it's more powerful to be able to more generally generate colors for an arbitrary number of categories
library(RColorBrewer)
NumberOfLevels<-length(levels(mytab\$factor))
mycolors<-brewer.pal(n=NumberOfLevels, name="Set1")

# Make box sizes proportional to N
# First just to show how this works without regard for getting the sizes right
levelProportions<-c(.1, .3, .6)
png(filename=paste(WorkingDir, "plot_4", ".png", sep=""), units="in", width=mywidth, height=myheight, res=myres)
boxplot(mytab\$value ~ mytab\$factor, width=levelProportions, col=mycolors)
dev.off()
``` ```# Let's get the sizes right, based on sample size, and generalize in case the level order is changed again.
# Also note the 'outpch=NA' argument--this removes the outlier points.  In the next step I add these back.
levelProportions<-summary(mytab\$factor)/num.tot
png(filename=paste(WorkingDir, "plot_5", ".png", sep=""), units="in", width=mywidth, height=myheight, res=myres)
boxplot(mytab\$value ~ mytab\$factor, width=levelProportions, col=mycolors, outpch=NA)

mylevels<-levels(mytab\$factor)
for(i in 1:length(mylevels))
{
thislevel<-mylevels[i]
thisvalues<-mytab[mytab\$factor==thislevel, "value"]

# take the x-axis indices and add a jitter, proportional to the N in each level
myjitter<-jitter(rep(i, length(thisvalues)), amount=levelProportions[i]/2)
points(myjitter, thisvalues, pch=20, col=rgb(0,0,0,.2))

# While we're looping, lets add some text
# I played with this a lot for this particular plot.  I tried to make it general, but you may have to adjust a bit for different data.
TopOfWhisker<-min(max(thisvalues), median(thisvalues)+IQR(thisvalues)*1.5)
text(i+levelProportions[i]/2, TopOfWhisker, labels=paste("N=", length(thisvalues), sep=""), cex=.6, font=2, pos=4)
}
dev.off()
``` ## 2 thoughts on “Reordering the factor levels in R boxplots, and making them look pretty with base graphics”

1. elena

super useful and really well explained! thanks a lot!!!!

1. jpwendler Post author

Thanks very much Elena!