A family for lumping together levels that meet some criteria.
fct_lump_min()
: lumps levels that appear fewer thanmin
times.fct_lump_prop()
: lumps levels that appear in fewer than (or equal to)prop * n
times.fct_lump_n()
lumps all levels except for then
most frequent (or least frequent ifn < 0
)fct_lump_lowfreq()
lumps together the least frequent levels, ensuring that "other" is still the smallest level.
fct_lump()
exists primarily for historical reasons, as it automatically
picks between these different methods depending on its arguments.
We no longer recommend that you use it.
Usage
fct_lump(
f,
n,
prop,
w = NULL,
other_level = "Other",
ties.method = c("min", "average", "first", "last", "random", "max")
)
fct_lump_min(f, min, w = NULL, other_level = "Other")
fct_lump_prop(f, prop, w = NULL, other_level = "Other")
fct_lump_n(
f,
n,
w = NULL,
other_level = "Other",
ties.method = c("min", "average", "first", "last", "random", "max")
)
fct_lump_lowfreq(f, w = NULL, other_level = "Other")
Arguments
- f
A factor (or character vector).
- n
Positive
n
preserves the most commonn
values. Negativen
preserves the least common-n
values. It there are ties, you will get at leastabs(n)
values.- prop
Positive
prop
lumps values which do not appear at leastprop
of the time. Negativeprop
lumps values that do not appear at most-prop
of the time.- w
An optional numeric vector giving weights for frequency of each value (not level) in f.
- other_level
Value of level used for "other" values. Always placed at end of levels.
- ties.method
A character string specifying how ties are treated. See
rank()
for details.- min
Preserve levels that appear at least
min
number of times.
See also
fct_other()
to convert specified levels to other.
Examples
x <- factor(rep(LETTERS[1:9], times = c(40, 10, 5, 27, 1, 1, 1, 1, 1)))
x %>% table()
#> .
#> A B C D E F G H I
#> 40 10 5 27 1 1 1 1 1
x %>%
fct_lump_n(3) %>%
table()
#> .
#> A B D Other
#> 40 10 27 10
x %>%
fct_lump_prop(0.10) %>%
table()
#> .
#> A B D Other
#> 40 10 27 10
x %>%
fct_lump_min(5) %>%
table()
#> .
#> A B C D Other
#> 40 10 5 27 5
x %>%
fct_lump_lowfreq() %>%
table()
#> .
#> A D Other
#> 40 27 20
x <- factor(letters[rpois(100, 5)])
x
#> [1] e e k a g c a e a g b e b g d b k e f g c d c e h e e g g b d b a e d
#> [36] d b h e f h d d f g c b c f f d b d d b f c c c c d b i c g b d e b g
#> [71] c d e a d e h c b e f d f d b c d a e c e a b c c h c c g
#> Levels: a b c d e f g h i k
table(x)
#> x
#> a b c d e f g h i k
#> 7 15 18 17 16 8 10 5 1 2
table(fct_lump_lowfreq(x))
#>
#> a b c d e f g h Other
#> 7 15 18 17 16 8 10 5 3
# Use positive values to collapse the rarest
fct_lump_n(x, n = 3)
#> [1] e e Other Other Other c Other e Other Other Other
#> [12] e Other Other d Other Other e Other Other c d
#> [23] c e Other e e Other Other Other d Other Other
#> [34] e d d Other Other e Other Other d d Other
#> [45] Other c Other c Other Other d Other d d Other
#> [56] Other c c c c d Other Other c Other Other
#> [67] d e Other Other c d e Other d e Other
#> [78] c Other e Other d Other d Other c d Other
#> [89] e c e Other Other c c Other c c Other
#> Levels: c d e Other
fct_lump_prop(x, prop = 0.1)
#> [1] e e Other Other g c Other e Other g b
#> [12] e b g d b Other e Other g c d
#> [23] c e Other e e g g b d b Other
#> [34] e d d b Other e Other Other d d Other
#> [45] g c b c Other Other d b d d b
#> [56] Other c c c c d b Other c g b
#> [67] d e b g c d e Other d e Other
#> [78] c b e Other d Other d b c d Other
#> [89] e c e Other b c c Other c c g
#> Levels: b c d e g Other
# Use negative values to collapse the most common
fct_lump_n(x, n = -3)
#> [1] Other Other k Other Other Other Other Other Other Other Other
#> [12] Other Other Other Other Other k Other Other Other Other Other
#> [23] Other Other h Other Other Other Other Other Other Other Other
#> [34] Other Other Other Other h Other Other h Other Other Other
#> [45] Other Other Other Other Other Other Other Other Other Other Other
#> [56] Other Other Other Other Other Other Other i Other Other Other
#> [67] Other Other Other Other Other Other Other Other Other Other h
#> [78] Other Other Other Other Other Other Other Other Other Other Other
#> [89] Other Other Other Other Other Other Other h Other Other Other
#> Levels: h i k Other
fct_lump_prop(x, prop = -0.1)
#> [1] Other Other k a Other Other a Other a Other Other
#> [12] Other Other Other Other Other k Other f Other Other Other
#> [23] Other Other h Other Other Other Other Other Other Other a
#> [34] Other Other Other Other h Other f h Other Other f
#> [45] Other Other Other Other f f Other Other Other Other Other
#> [56] f Other Other Other Other Other Other i Other Other Other
#> [67] Other Other Other Other Other Other Other a Other Other h
#> [78] Other Other Other f Other f Other Other Other Other a
#> [89] Other Other Other a Other Other Other h Other Other Other
#> Levels: a f h i k Other
# Use weighted frequencies
w <- c(rep(2, 50), rep(1, 50))
fct_lump_n(x, n = 5, w = w)
#> Error in fct_lump_n(x, n = 5, w = w): `w` must be the same length as `f` (99), not length 100.
# Use ties.method to control how tied factors are collapsed
fct_lump_n(x, n = 6)
#> [1] e e Other Other g c Other e Other g b
#> [12] e b g d b Other e f g c d
#> [23] c e Other e e g g b d b Other
#> [34] e d d b Other e f Other d d f
#> [45] g c b c f f d b d d b
#> [56] f c c c c d b Other c g b
#> [67] d e b g c d e Other d e Other
#> [78] c b e f d f d b c d Other
#> [89] e c e Other b c c Other c c g
#> Levels: b c d e f g Other
fct_lump_n(x, n = 6, ties.method = "max")
#> [1] e e Other Other g c Other e Other g b
#> [12] e b g d b Other e f g c d
#> [23] c e Other e e g g b d b Other
#> [34] e d d b Other e f Other d d f
#> [45] g c b c f f d b d d b
#> [56] f c c c c d b Other c g b
#> [67] d e b g c d e Other d e Other
#> [78] c b e f d f d b c d Other
#> [89] e c e Other b c c Other c c g
#> Levels: b c d e f g Other
# Use fct_lump_min() to lump together all levels with fewer than `n` values
table(fct_lump_min(x, min = 10))
#>
#> b c d e g Other
#> 15 18 17 16 10 23
table(fct_lump_min(x, min = 15))
#>
#> b c d e Other
#> 15 18 17 16 33