A family for lumping together levels that meet some criteria.
fct_lump_min(): lumps levels that appear fewer thanmintimes.fct_lump_prop(): lumps levels that appear in fewerprop * ntimes.fct_lump_n()lumps all levels except for thenmost frequent (or least frequent ifn < 0)fct_lump_lowfreq()lumps together the least frequent levels, ensuring that "other" is still the smallest level.
fct_lump() exists primarily for historical reasons, as it automatically
picks between these different methods depending on its arguments.
We no longer recommend that you use it.
Usage
fct_lump(
f,
n,
prop,
w = NULL,
other_level = "Other",
ties.method = c("min", "average", "first", "last", "random", "max")
)
fct_lump_min(f, min, w = NULL, other_level = "Other")
fct_lump_prop(f, prop, w = NULL, other_level = "Other")
fct_lump_n(
f,
n,
w = NULL,
other_level = "Other",
ties.method = c("min", "average", "first", "last", "random", "max")
)
fct_lump_lowfreq(f, other_level = "Other")Arguments
- f
A factor (or character vector).
- n
Positive
npreserves the most commonnvalues. Negativenpreserves the least common-nvalues. It there are ties, you will get at leastabs(n)values.- prop
Positive
proplumps values which do not appear at leastpropof the time. Negativeproplumps values that do not appear at most-propof the time.- w
An optional numeric vector giving weights for frequency of each value (not level) in f.
- other_level
Value of level used for "other" values. Always placed at end of levels.
- ties.method
A character string specifying how ties are treated. See
rank()for details.- min
Preserve levels that appear at least
minnumber of times.
See also
fct_other() to convert specified levels to other.
Examples
x <- factor(rep(LETTERS[1:9], times = c(40, 10, 5, 27, 1, 1, 1, 1, 1)))
x %>% table()
#> .
#> A B C D E F G H I
#> 40 10 5 27 1 1 1 1 1
x %>% fct_lump_n(3) %>% table()
#> .
#> A B D Other
#> 40 10 27 10
x %>% fct_lump_prop(0.10) %>% table()
#> .
#> A B D Other
#> 40 10 27 10
x %>% fct_lump_min(5) %>% table()
#> .
#> A B C D Other
#> 40 10 5 27 5
x %>% fct_lump_lowfreq() %>% table()
#> .
#> A D Other
#> 40 27 20
x <- factor(letters[rpois(100, 5)])
x
#> [1] e f f j i e d e f d h g g e d d f d g h c f d g h f b d e g f f e e
#> [35] f i i h b f e h j c g e d l b e b d j c e e c c c g g d c c g f e h
#> [69] f e d h b c c d c d e e b d f e h f f e c g d f d b c c d a g g
#> Levels: a b c d e f g h i j l
table(x)
#> x
#> a b c d e f g h i j l
#> 1 7 14 17 18 16 12 8 3 3 1
table(fct_lump_lowfreq(x))
#>
#> b c d e f g h i j Other
#> 7 14 17 18 16 12 8 3 3 2
# Use positive values to collapse the rarest
fct_lump_n(x, n = 3)
#> [1] e f f Other Other e d e f d Other
#> [12] Other Other e d d f d Other Other Other f
#> [23] d Other Other f Other d e Other f f e
#> [34] e f Other Other Other Other f e Other Other Other
#> [45] Other e d Other Other e Other d Other Other e
#> [56] e Other Other Other Other Other d Other Other Other f
#> [67] e Other f e d Other Other Other Other d Other
#> [78] d e e Other d f e Other f f e
#> [89] Other Other d f d Other Other Other d Other Other
#> [100] Other
#> Levels: d e f Other
fct_lump_prop(x, prop = 0.1)
#> [1] e f f Other Other e d e f d Other
#> [12] g g e d d f d g Other c f
#> [23] d g Other f Other d e g f f e
#> [34] e f Other Other Other Other f e Other Other c
#> [45] g e d Other Other e Other d Other c e
#> [56] e c c c g g d c c g f
#> [67] e Other f e d Other Other c c d c
#> [78] d e e Other d f e Other f f e
#> [89] c g d f d Other c c d Other g
#> [100] g
#> Levels: c d e f g Other
# Use negative values to collapse the most common
fct_lump_n(x, n = -3)
#> [1] Other Other Other j i Other Other Other Other Other Other
#> [12] Other Other Other Other Other Other Other Other Other Other Other
#> [23] Other Other Other Other Other Other Other Other Other Other Other
#> [34] Other Other i i Other Other Other Other Other j Other
#> [45] Other Other Other l Other Other Other Other j Other Other
#> [56] Other Other Other Other Other Other Other Other Other Other Other
#> [67] Other Other Other Other Other Other Other Other Other Other Other
#> [78] Other Other Other Other Other Other Other Other Other Other Other
#> [89] Other Other Other Other Other Other Other Other Other a Other
#> [100] Other
#> Levels: a i j l Other
fct_lump_prop(x, prop = -0.1)
#> [1] Other Other Other j i Other Other Other Other Other h
#> [12] Other Other Other Other Other Other Other Other h Other Other
#> [23] Other Other h Other b Other Other Other Other Other Other
#> [34] Other Other i i h b Other Other h j Other
#> [45] Other Other Other l b Other b Other j Other Other
#> [56] Other Other Other Other Other Other Other Other Other Other Other
#> [67] Other h Other Other Other h b Other Other Other Other
#> [78] Other Other Other b Other Other Other h Other Other Other
#> [89] Other Other Other Other Other b Other Other Other a Other
#> [100] Other
#> Levels: a b h i j l Other
# Use weighted frequencies
w <- c(rep(2, 50), rep(1, 50))
fct_lump_n(x, n = 5, w = w)
#> [1] e f f Other Other e d e f d Other
#> [12] g g e d d f d g Other c f
#> [23] d g Other f Other d e g f f e
#> [34] e f Other Other Other Other f e Other Other c
#> [45] g e d Other Other e Other d Other c e
#> [56] e c c c g g d c c g f
#> [67] e Other f e d Other Other c c d c
#> [78] d e e Other d f e Other f f e
#> [89] c g d f d Other c c d Other g
#> [100] g
#> Levels: c d e f g Other
# Use ties.method to control how tied factors are collapsed
fct_lump_n(x, n = 6)
#> [1] e f f Other Other e d e f d h
#> [12] g g e d d f d g h c f
#> [23] d g h f Other d e g f f e
#> [34] e f Other Other h Other f e h Other c
#> [45] g e d Other Other e Other d Other c e
#> [56] e c c c g g d c c g f
#> [67] e h f e d h Other c c d c
#> [78] d e e Other d f e h f f e
#> [89] c g d f d Other c c d Other g
#> [100] g
#> Levels: c d e f g h Other
fct_lump_n(x, n = 6, ties.method = "max")
#> [1] e f f Other Other e d e f d h
#> [12] g g e d d f d g h c f
#> [23] d g h f Other d e g f f e
#> [34] e f Other Other h Other f e h Other c
#> [45] g e d Other Other e Other d Other c e
#> [56] e c c c g g d c c g f
#> [67] e h f e d h Other c c d c
#> [78] d e e Other d f e h f f e
#> [89] c g d f d Other c c d Other g
#> [100] g
#> Levels: c d e f g h Other
# Use fct_lump_min() to lump together all levels with fewer than `n` values
table(fct_lump_min(x, min = 10))
#>
#> c d e f g Other
#> 14 17 18 16 12 23
table(fct_lump_min(x, min = 15))
#>
#> d e f Other
#> 17 18 16 49
