Plotting factors with forcats

Thursday, February 21, 2019 · 3 minute read

Often, I find myself with a lot of categorical variables that I want to plot, where I’m really only interested in the most common ones that bubble to the surface. This post is a quick demonstration of how I plot just the top 5 or top 10 categories for a plot. I posted a question about this over on Stack Overflow. Thanks to all who commented!

Getting started

To start, I am going to load forats::gss_cat as my example dataset and make a simple time series of the types Protestant denominations respondents list.

library(tidyverse)

gss <- forcats::gss_cat

gss %>% 
  filter(relig == "Protestant") %>% 
  count(year, denom) %>%
  ggplot(aes(x = year, y = n, color = denom)) +
  geom_line() +
  scale_colour_viridis_d()

Plotting the top five

Wow, what an extremely ugly and meaningless plot! What if I wanted to just plot the top 5 Protestant denominations? There are a bunch of ways to do this, but maybe the easiest I’ve found is to filter down to those denominations with dplyr and save that as a simple vector; then in a second step, build your plot by first filtering to just those five in the vector.

protestant_top <- gss %>% 
  filter(relig == "Protestant") %>% 
  count(denom, sort = TRUE) %>% 
  head(5) %>% 
  pull(denom) #creating a simple vector.

#same plot now but filtered first for just the top five.
gss %>% 
  filter(denom %in% protestant_top) %>% #filter like this.
  count(year, denom) %>%
  ggplot(aes(x = year, y = n, color = denom)) +
  geom_line() +
  scale_colour_viridis_d()

That’s looking a little better!

Reordering factors

But there’s still a problem: my eyes have to hunt around to find which denomination is where, because the legend is in alphabetical order. What if the legend was ordered the same way the lines were plotted?

forcats really comes in handy here. Many useful functions in this package for working with factors start with fct_, and the one I am going to use is fct_reorder where I can order based off their count. It just takes two arguments: the factor or character vector and another vector to reorder them in ascending order. (You can also use the .fun argument to specify the mean or median for example.) Below, I can just give it n for the second argument. Lastly, .desc = TRUE orders it descending to align with the plot lines.

gss %>% 
  filter(denom %in% protestant_top) %>% 
  count(year, denom) %>%
  mutate(denom = fct_reorder(denom, n, .desc = TRUE)) %>% #fct_reorder here.
  ggplot(aes(x = year, y = n, color = denom)) +
  geom_line() +
  scale_colour_viridis_d()

Looks great!

Cleaning up

The last thing I want to do is add labeling and make it look prettier. I have been using hrbrthemes lately with Roboto Condensed font, and I love it. Below are the settings I usually have in all my markdown docs.

theme_set(theme_minimal(base_size = 10) +
  theme(plot.title = element_text(face = "bold"),
        axis.text = element_text(size = 8))
)

gss %>% 
  filter(denom %in% protestant_top) %>% 
  count(year, denom) %>%
  mutate(denom = fct_reorder(denom, n, .desc = TRUE)) %>% #fct_reorder here.
  ggplot(aes(x = year, y = n, color = denom)) +
  geom_line() +
  scale_colour_viridis_d() +
  labs(
    title = "Top five Protestant denominations",
    y = element_blank(),
    color = "Denomination",
    caption = "A sample of categorical variables from the General Social survey."
  )