This function differs from duplicated
in that it returns both the duplicate row and the row which has been duplicated.
This may prove useful in combination with the by
argument for determining whether two observations are identical across
more than just the specified columns.
duplicated_rows(DT, by = names(DT), na.rm = FALSE, order = TRUE, copyDT = TRUE, na.last = FALSE)
DT | A |
---|---|
by | Character vector of columns to evaluate duplicates over. |
na.rm | (logical) Should |
order | (logical) Should the result be ordered so that duplicate rows are adjacent? (Default |
copyDT | (logical) Should |
na.last | (logical) If |
Duplicate rows of DT
by by
. For interactive use.
if (requireNamespace("data.table", quietly = TRUE)) { library(data.table) DT <- data.table(x = rep(1:4, 3), y = rep(1:2, 6), z = rep(1:3, 4)) # No duplicates duplicated_rows(DT) # x and y have duplicates duplicated_rows(DT, by = c("x", "y"), order = FALSE) # By default, the duplicate rows are presented adjacent to each other. duplicated_rows(DT, by = c("x", "y")) }#> x y z #> 1: 1 1 1 #> 2: 1 1 2 #> 3: 1 1 3 #> 4: 2 2 2 #> 5: 2 2 3 #> 6: 2 2 1 #> 7: 3 1 3 #> 8: 3 1 1 #> 9: 3 1 2 #> 10: 4 2 1 #> 11: 4 2 2 #> 12: 4 2 3