Useful when you want to constrain the number of unique values in a column by keeping only the most common values.
mutate_other(.data, var, n = 5, count, by = NULL, var.weight = NULL, mass = NULL, copy = TRUE, other.category = "Other")
.data | Data containing variable. |
---|---|
var | Variable containing infrequent entries, to be collapsed into "Other". |
n | Threshold for total number of categories above "Other". |
count | Threshold for total count of observations before "Other". |
by | Extra variables to group by when calculating |
var.weight | Variable to act as a weight: |
mass | Threshold for sum of |
copy | Should |
other.category | Value that infrequent entries are to be collapsed into. Defaults to |
.data
but with var
changed so that infrequent values have the same value (other.category
).
#> #>#>#> #>DT <- data.table(City = c("A", "A", "B", "B", "C", "D"), value = c(1, 9, 4, 4, 5, 11)) DT %>% mutate_other("City", var.weight = "value", mass = 10) %>% .[]#> Warning: `mass` was provided, yet `n` was not set to NULL. As a result, `mass` may be misinterpreted. If you intended to use `mass` to create the other category, set `n = NULL`. Otherwise, do not provide `mass`.#> City value #> 1: A 1 #> 2: A 9 #> 3: B 4 #> 4: B 4 #> 5: C 5 #> 6: D 11