This function differs from duplicated in that it returns both the duplicate row and the row which has been duplicated. This may prove useful in combination with the by argument for determining whether two observations are identical across more than just the specified columns.

duplicated_rows(DT, by = names(DT), na.rm = FALSE, order = TRUE,
  copyDT = TRUE, na.last = FALSE)

Arguments

DT

A data.table.

by

Character vector of columns to evaluate duplicates over.

na.rm

(logical) Should NAs in by be removed before returning duplicates? (Default FALSE.)

order

(logical) Should the result be ordered so that duplicate rows are adjacent? (Default TRUE.)

copyDT

(logical) Should DT be copied prior to detecting duplicates. If FALSE, the ordering of DT will be changed by reference.

na.last

(logical) If order is TRUE, should NAs be ordered first or last?. Passed to data.table::setorderv.

Value

Duplicate rows of DT by by. For interactive use.

Examples

if (requireNamespace("data.table", quietly = TRUE)) { library(data.table) DT <- data.table(x = rep(1:4, 3), y = rep(1:2, 6), z = rep(1:3, 4)) # No duplicates duplicated_rows(DT) # x and y have duplicates duplicated_rows(DT, by = c("x", "y"), order = FALSE) # By default, the duplicate rows are presented adjacent to each other. duplicated_rows(DT, by = c("x", "y")) }
#> x y z #> 1: 1 1 1 #> 2: 1 1 2 #> 3: 1 1 3 #> 4: 2 2 2 #> 5: 2 2 3 #> 6: 2 2 1 #> 7: 3 1 3 #> 8: 3 1 1 #> 9: 3 1 2 #> 10: 4 2 1 #> 11: 4 2 2 #> 12: 4 2 3