Lump together least/most common factor levels into "other"

fct_lump(f, n, prop, other_level = "Other", ties.method = c("min",
  "average", "first", "last", "random", "max"))

Arguments

f

A factor.

n, prop

If both n and prop are missing, fct_lump lumps together the least frequent levels into "other", while ensuring that "other" is still the smallest level. It's particularly useful in conjunction with fct_inorder().

Positive n preserves the most common n values. Negative n preserves the least common -n values. It there are ties, you will get at least abs(n) values.

Positive prop, preserves values that appear at least prop of the time. Negative prop, preserves values that appear at most -prop of the time.

other_level

Value of level used for "other" values. Always placed at end of levels.

ties.method

A character string specifying how ties are treated. See rank() for details

See also

fct_other() to convert specified levels to other.

Examples

x <- factor(rep(LETTERS[1:9], times = c(40, 10, 5, 27, 1, 1, 1, 1, 1))) x %>% table()
#> . #> A B C D E F G H I #> 40 10 5 27 1 1 1 1 1
x %>% fct_lump() %>% table()
#> . #> A D Other #> 40 27 20
x %>% fct_lump() %>% fct_inorder() %>% table()
#> . #> A Other D #> 40 20 27
x <- factor(letters[rpois(100, 5)]) x
#> [1] b d e f e g d f e c c c d c f e g b c d g d h e c d c b f g c e f d f f e #> [38] d d d c h f d g h f f d c b b f c h h e h h c f e f d b e c b f e c h d b #> [75] f e e f d c g f g d d e c a b d c g e f d f f g g e #> Levels: a b c d e f g h
table(x)
#> x #> a b c d e f g h #> 1 9 17 19 16 20 10 8
table(fct_lump(x))
#> #> b c d e f g h Other #> 9 17 19 16 20 10 8 1
# Use positive values to collapse the rarest fct_lump(x, n = 3)
#> [1] Other d Other f Other Other d f Other c c c #> [13] d c f Other Other Other c d Other d Other Other #> [25] c d c Other f Other c Other f d f f #> [37] Other d d d c Other f d Other Other f f #> [49] d c Other Other f c Other Other Other Other Other c #> [61] f Other f d Other Other c Other f Other c Other #> [73] d Other f Other Other f d c Other f Other d #> [85] d Other c Other Other d c Other Other f d f #> [97] f Other Other Other #> Levels: c d f Other
fct_lump(x, prop = 0.1)
#> [1] Other d e f e Other d f e c c c #> [13] d c f e Other Other c d Other d Other e #> [25] c d c Other f Other c e f d f f #> [37] e d d d c Other f d Other Other f f #> [49] d c Other Other f c Other Other e Other Other c #> [61] f e f d Other e c Other f e c Other #> [73] d Other f e e f d c Other f Other d #> [85] d e c Other Other d c Other e f d f #> [97] f Other Other e #> Levels: c d e f Other
# Use negative values to collapse the most common fct_lump(x, n = -3)
#> [1] b Other Other Other Other Other Other Other Other Other Other Other #> [13] Other Other Other Other Other b Other Other Other Other h Other #> [25] Other Other Other b Other Other Other Other Other Other Other Other #> [37] Other Other Other Other Other h Other Other Other h Other Other #> [49] Other Other b b Other Other h h Other h h Other #> [61] Other Other Other Other b Other Other b Other Other Other h #> [73] Other b Other Other Other Other Other Other Other Other Other Other #> [85] Other Other Other a b Other Other Other Other Other Other Other #> [97] Other Other Other Other #> Levels: a b h Other
fct_lump(x, prop = -0.1)
#> [1] b Other Other Other Other g Other Other Other Other Other Other #> [13] Other Other Other Other g b Other Other g Other h Other #> [25] Other Other Other b Other g Other Other Other Other Other Other #> [37] Other Other Other Other Other h Other Other g h Other Other #> [49] Other Other b b Other Other h h Other h h Other #> [61] Other Other Other Other b Other Other b Other Other Other h #> [73] Other b Other Other Other Other Other Other g Other g Other #> [85] Other Other Other a b Other Other g Other Other Other Other #> [97] Other g g Other #> Levels: a b g h Other
# Use ties.method to control how tied factors are collapsed fct_lump(x, n = 6)
#> [1] b d e f e g d f e c c c #> [13] d c f e g b c d g d Other e #> [25] c d c b f g c e f d f f #> [37] e d d d c Other f d g Other f f #> [49] d c b b f c Other Other e Other Other c #> [61] f e f d b e c b f e c Other #> [73] d b f e e f d c g f g d #> [85] d e c Other b d c g e f d f #> [97] f g g e #> Levels: b c d e f g Other
fct_lump(x, n = 6, ties.method = "max")
#> [1] b d e f e g d f e c c c #> [13] d c f e g b c d g d Other e #> [25] c d c b f g c e f d f f #> [37] e d d d c Other f d g Other f f #> [49] d c b b f c Other Other e Other Other c #> [61] f e f d b e c b f e c Other #> [73] d b f e e f d c g f g d #> [85] d e c Other b d c g e f d f #> [97] f g g e #> Levels: b c d e f g Other