Useful when you want to constrain the number of unique values in a column by keeping only the most common values.

mutate_other(.data, var, n = 5, count, by = NULL, var.weight = NULL,
  mass = NULL, copy = TRUE, other.category = "Other")

Arguments

.data

Data containing variable.

var

Variable containing infrequent entries, to be collapsed into "Other".

n

Threshold for total number of categories above "Other".

count

Threshold for total count of observations before "Other".

by

Extra variables to group by when calculating n or count.

var.weight

Variable to act as a weight: var's where the sum of this variable exceeds mass will be kept, others set to other.category.

mass

Threshold for sum of var.weight: any var where the aggregated sum of var.weight exceeds mass will be kept and other var will be set to other.category. By default (mass = NULL), the value of mass is \(-\infty\), with a warning. You may set it explicitly to -Inf if you really want to avoid a warning that this function will have no effect.

copy

Should .data be copied? Currently only TRUE is supported.

other.category

Value that infrequent entries are to be collapsed into. Defaults to "Other".

Value

.data but with var changed so that infrequent values have the same value (other.category).

Examples

library(data.table) library(magrittr)
#> #> Attaching package: 'magrittr'
#> The following objects are masked from 'package:testthat': #> #> equals, is_less_than, not
DT <- data.table(City = c("A", "A", "B", "B", "C", "D"), value = c(1, 9, 4, 4, 5, 11)) DT %>% mutate_other("City", var.weight = "value", mass = 10) %>% .[]
#> Warning: `mass` was provided, yet `n` was not set to NULL. As a result, `mass` may be misinterpreted. If you intended to use `mass` to create the other category, set `n = NULL`. Otherwise, do not provide `mass`.
#> City value #> 1: A 1 #> 2: A 9 #> 3: B 4 #> 4: B 4 #> 5: C 5 #> 6: D 11