R: Keep only data from the first occurring row when there are duplicates in certain column within the same row

7 months ago

I want to keep only data from the first occurring row when there are duplicates in certain column within the same row.

Suppose we have a dataframe like this:

df <- data.frame(
A = c(99, 99, 99, 100, 101),
B = c("apple", "orange", "apple", "banana", "apple"),
C = c("eaten", "eaten", "fresh", "eaten", "fresh")
) # create sample dataframe

I want to drop the second and third rows because I only want to keep the first record for cases duplicated with 99, while retaining the rest.

This is how the code looks like with the use of distinct(), where the duplicates are removed after the first row of its occurrence.

df_unique <- df %>%
distinct(A, .keep_all = TRUE) # Remove duplicate rows based on A column

Screenshot 2023-10-10 at 2.37.04 PM.png

coding programming development proofofbrain stemgeeks datascience rstats

0.000

0 comments