R uses factors to handle categorical variables, variables that have a fixed and known set of possible values. Factors are also helpful for reordering character vectors to improve display. The goal of the forcats package is to provide a suite of tools that solve common problems with factors, including changing the order of levels or the values. Some examples include:
fct_reorder(): Reordering a factor by another variable.
fct_infreq(): Reordering a factor by the frequency of values.
fct_relevel(): Changing the order of a factor by hand.
fct_lump(): Collapsing the least/most frequent values of a factor into “other”.
# The easiest way to get forcats is to install the whole tidyverse: install.packages("tidyverse") # Alternatively, install just forcats: install.packages("forcats") # Or the the development version from GitHub: # install.packages("pak") ::pak("tidyverse/forcats")pak
starwars %>% mutate(eye_color = fct_infreq(eye_color)) %>% ggplot(aes(x = eye_color)) + geom_bar() + coord_flip()
For a history of factors, I recommend stringsAsFactors: An unauthorized biography by Roger Peng and stringsAsFactors = <sigh> by Thomas Lumley. If you want to learn more about other approaches to working with factors and categorical data, I recommend Wrangling categorical data in R, by Amelia McNamara and Nicholas Horton.