Adding Custom Labels on Box Plots - ggplot Tutorial 9
Suppose you just created a beautiful box plot in the R programming language, but now you want to add the mean, the median, or some other statistic to each of your box plots. How do you do it? This post will show you how!
Getting Started
As with all of the previous posts, we're going to start by pulling the tidyverse package into our workspace and creating a basic box plot that looks something like this. Keep in mind that this post assumes you are already familiar with how to create a basic box plot. If this is completely new to you, don't fear. I have a previous tutorial that explains all of the basics!
library(tidyverse)
mtcars$cyl <- as.factor(mtcars$cyl)
ggplot(data = mtcars, aes(x = mpg, y = cyl))+
geom_boxplot(show.legend = TRUE)
mtcars$cyl <- as.factor(mtcars$cyl)
Adding the Labels
Adding in the custom labels is a two-part process. In the first step, we have to define a function that tells R what type of information we want to include and where to place the label. In the second part, we will actually add the label onto the existing ggplot canvas.
Step 1 - Create the Function
The first step to add a custom label is to create the function. This function tells R what type of statistical calculations it needs to perform and what needs to be included on the box plot. I know this code can seem a little bit confusing, so let's break it down line by line.
Line 1 simply defines a variable and assigns a function to that variable. We pass in X, which is a placeholder of values that the function will accept, and then we open the curly braces to begin typing the content of the function.
Line 2 states that we will create a data frame.
Line 3 specifies the y-coordinate where we will place the function output. In this specific example, I'm setting y to the max of x minus 5. So, if the maximum number of miles per gallon for the 8-cylinder car was 20, this would place the label at the y position of 15. If the max number of miles per gallon for the 6-cylinder box plot was 30, then the label would be placed at 25.
Line 4 creates a label from two specific variables. The first variable inside of the quotation marks is simply pure text. Whatever we place here will print directly on the screen. In this example, I'm using mean. I then place a comma and enter the statistical function that I want R to perform. In this example, I'm doing a simple mean of the average miles per gallon for each of the different box plots.
Keep in mind that you can enter a wide range of functions here. For example, if you wanted the count of the number of observations within each box plot, you could replace "mean =" with "count=" and then replace round(mean(x), 1) with length(x).
box_stats <- function(x) {
data.frame(
y = max(x) -5, # Position the label slightly above the box
label = paste("mean =", round(mean(x), 1)
))
}
Adding stat_summary
Once you have created the function that you want, you can add it on top of your existing geometry by using the stat_summary function and entering the name of your variable for the fun.data argument.
ggplot(mtcars, aes(x = cyl, y = mpg, fill = cyl)) +
geom_boxplot() +
stat_summary(fun.data = box_stats, geom = "text")
Summary
I know this was one of the more confusing tutorials, but I still hope that it was helpful. If you have any confusion, I would recommend checking out a detailed tutorial about how to create functions in the R programming language. Understanding functions is going to be the key to effectively including these customized labels onto your box plots. Thanks for reading and have a great day!
Hi, I´ve been wanting to ask you if you can make some tutorials for pie charts with ggplot2, I founs very dificult to personalize those. Any suggestions would be very welcome. Thanks
$wine
Hey, thanks for the comment. I'm just finishing up the series on box plots, and I was wondering what to cover next. I'll probably start doing some pie chart tutorials in the next few days. The first tutorial will probably be kind of basic, and then I'll do some more advanced stuff. Thanks for the suggestion.
Thank you very much. Looking forward to those pie charts.
$wine