A family of functions to lump together levels based on different criteria:
fct_lump_min(): lumps levels that appear fewer thanmintimes.fct_lump_prop(): lumps levels that appear in fewer than (or equal to)prop * ntimes.fct_lump_n()lumps all levels except for thenmost frequent (or least frequent ifn < 0)fct_lump_lowfreq()lumps together the least frequent levels, ensuring that "other" is still the smallest level.
Usage
fct_lump_min(f, min, w = NULL, other_level = "Other")
fct_lump_prop(f, prop, w = NULL, other_level = "Other")
fct_lump_n(
f,
n,
w = NULL,
other_level = "Other",
ties.method = c("min", "average", "first", "last", "random", "max")
)
fct_lump_lowfreq(f, w = NULL, other_level = "Other")Arguments
- f
A factor (or character vector).
- min
Preserve levels that appear at least
minnumber of times.- w
An optional numeric vector giving weights for frequency of each value (not level) in
f.- other_level
Value of level used for "other" values. Always placed at end of levels.
- prop
Positive
proplumps values which do not appear at leastpropof the time. Negativeproplumps values that do not appear at most-propof the time.- n
Positive
npreserves the most commonnvalues. Negativenpreserves the least common-nvalues. If there are ties, you will get at leastabs(n)values.- ties.method
A character string specifying how ties are treated. See
rank()for details.
See also
fct_other() to convert specified levels to other.
Examples
x <- factor(rep(LETTERS[1:9], times = c(40, 10, 5, 27, 1, 1, 1, 1, 1)))
x |> table()
#> x
#> A B C D E F G H I
#> 40 10 5 27 1 1 1 1 1
x |>
fct_lump_n(3) |>
table()
#>
#> A B D Other
#> 40 10 27 10
x |>
fct_lump_prop(0.10) |>
table()
#>
#> A B D Other
#> 40 10 27 10
x |>
fct_lump_min(5) |>
table()
#>
#> A B C D Other
#> 40 10 5 27 5
x |>
fct_lump_lowfreq() |>
table()
#>
#> A D Other
#> 40 27 20
