Make Better Boxplots!! - ggplot Tutorial 7
Box plots are a powerful statistical tool that graphically represent five essential characteristics of your data: the minimum value, the maximum value, and the first, second (median), and third quartiles. In today's post, I'll show you how to create effective box plots that will enhance the quality of your presentations. By the end of this post, you’ll know how to
-Create a basic box plot
-Customize its color
-Add titles
-Generate multiple box plots based on a categorical (factor)variable.
Note: This is an introductory tutorial that assumes you have absolutely zero knowledge of the R programming language. If this is a little too basic for you, don't worry. Future tutorials will highlight more advanced features.
Getting Started
Once you have RStudio opened up and installed, the first step is to install the tidyverse package and then pull it into your workspace. This package contains both the dataset that we will use, as well as the plotting utilities actually needed to create the graph.
install.packages("tidyverse")
library(tidyverse)
Creating our first box plot requires two different commands. The first command accepts the dataset that we are using, and we pass in the specific column from our dataset that we want to create a chart of by using the AES command. After creating that basic canvas, we then use the command geom_boxplot to tell R we want to create a box plot.
ggplot(data = mtcars, aes(x=mpg))+geom_boxplot()
Admittedly, this is a pretty basic box plot. It's not super fancy. It's missing labels, there aren't any colors, and we could improve it a lot. However, the important thing is that it's a start, and this basic box plot is the foundation which we will build from.

A Little Fancier
ggplot(data = mtcars, aes(x=mpg))+geom_boxplot()+
labs(
title = "Miles Per Gallon by Cylinder",
x = "Miles Per Gallon",
y = "Cylinders"
)

That won't win any awards, but at least the labels tell us what we are looking at. Let's take things up a notch. We can change both the color and the outline of our box plot, by using the color and fill commands inside of the geom_box plot command. Note that color changes the outline of the box plot, but that fill changes the inside color.
ggplot(data = mtcars, aes(x=mpg))+geom_boxplot(fill = "yellow", color=
"blue")+
labs(
title = "Miles Per Gallon by Cylinder",
x = "Miles Per Gallon",
y = "Cylinders"
)

One More Thing
Let's suppose that we wanted to create separate box plots for each of the three different categories of cars. For example, I want a box plot for all the four-cylinder cars, a box plot for all the six-cylinder cars, and a box plot for all of the eight-cylinder cars. We can easily do that by setting the y aesthetic inside of the ggplot command to the name of the factor variable that we want to create these categories. We can also create a different color for each by using the fill = color command. NOTE: you can only break the chart apart like this based on a FACTOR variable. By default the cyl variable in mtcars is numeric, so I have to converst it to a factor first.
mtcars$cyl <- as.factor(mtcars$cyl)
ggplot(data = mtcars, aes(x=mpg, y = cyl, fill =cyl))+geom_boxplot()+
labs(
title = "Miles Per Gallon by Cylinder",
x = "Miles Per Gallon",
y = "Cylinders"
)

Summary
As always, thanks so much for taking the time to read my work! I hope you found it useful, and I hope you have an amazing day!
Thanks for your contribution to the STEMsocial community. Feel free to join us on discord to get to know the rest of us!
Please consider delegating to the @stemsocial account (85% of the curation rewards are returned).
Consider setting @stemsocial as a beneficiary of this post's rewards if you would like to support the community and contribute to its mission of promoting science and education on Hive.
Congratulations @algoswithamber! You have completed the following achievement on the Hive blockchain And have been rewarded with New badge(s)
Your next target is to reach 2000 upvotes.
You can view your badges on your board and compare yourself to others in the Ranking
If you no longer want to receive notifications, reply to this comment with the word
STOPIt's been a while since I touched R & ggplot. You can center the title with
+ theme(plot.title = element_text(hjust = 0.5))after the labs() part.The other parts are good.
Thanks so much for the comment. I always struggle to know how much detail and how many things to cover in each post. I almost always use hjust myself, so I'm not sure why I overlooked putting it in here. Can I ask how you get the user flair that says "Member ~ Education" ? I'm relatively new to Hive/PeakD, but I'd like to get something similar.